Figure 1: Modern inexpensive HMD: The General Reality
CE-200W. (Photo: General Reality Corp.) |
Head Mounted Display (HMD) VRIn HMD VR, the user “wears” a stereo
display, much like a pair of glasses that provides a view into the virtual
world. The physical form of these “glasses” can range from something on
the scale of a motorcycle helmet to a pair of sunglasses. Figure 1
illustrates one example of a HMD.
There is a great variety in display quality. The goal in the technology
is to provide the widest field of view at the highest quality and with the
least weight and at a reasonable cost. The reader is referred to Neale
[12] for a reasonably up-to-date survey of HMD technology.
There exist a range of high degree of freedom (HDOF) input devices that
can be used in interaction with such systems. An overall directory of
sources to input devices can be found in Buxton [3]. Furthermore, a number
of classes of HDOF technologies are discussed in the contribution of
Shumin Zhai [20] in this special issue. Because of the typical mobility of
the user (compared to desktop systems), however, most HMD systems use what
Zhai calls a flying mouse class of device, often in conjunction with a
dataglove type controller. In some cases, each hand is instrumented in
order to support bimanual interaction.
The issue with virtually all HMDs is that the eyes are covered by the
display. Consequently, one sees the virtual world at the expense of the
physical one. Users cannot directly see their hands nor the devices that
they are controlling. Similarly, they cannot directly see objects or other
people who are in their immediate physical environment. Therefore, in
order to function, some representation of such entities from the physical
world must appear in the virtual one. In order to use my hands, I most
likely must see a representation of them. Likewise, in order to avoid
bumping into a table, I must see a representation of it, and to avoid
bumping into you, I must see an avatar, or some other representation of
you.
In collaborative work a significant observation that emerges from this
is that, visually, HMD VR treats those in the same and those in remote
physical spaces in the same way (some would say equally poorly, since
visually there is no advantage to “being there” physically).
There is an important caveat to raise at this juncture. Some
researchers have found a way around the problem of seeing the physical
world (such as objects, their hands, tools or other people) while wearing
HMDs. One approach is to mount one or more video cameras onto the HMD and
feed the signals to the displays [19, 2, 16]. The cameras function as
surrogate eyes providing a view into the physical world onto which is
superimposed a computer generated view of the virtual world. The result is
much like a head’s up display, and this approach to VR falls into the
general category of augmented reality (AR), since it enables the computer
to augment our view of the physical world with additional information. See
[9] for an example of augmented reality and its application.
One important application of this technology is in remote
collaboration. As an example, take the case of a technician who needs
guidance to repair a complex piece of equipment from an expert who is not
physically there. Through the cameras mounted on the technician’s HMD, the
expert can remotely see what the technician is looking at. Conversely,
using VR technology, the expert can point and indicate to the technician
what to do. The guidance of the expert is superimposed on the technician’s
view of the equipment in the HMD, thereby enabling the repair to
proceed.
Clearly the ability to support AR is an important attribute of HMD VR.
However, since it is not in the mainstream of HMD VR, in the bulk the
discussion which follows we will assume that this is not in
place. |
Figure 2: Schematic showing the relationship among the
eyes, hands and display in HMD Style VR.
Figure 3: Fakespace BOOM3C boom mounted display (Photo:
Fakespace, Inc.)
Figure 4: Schematic of an idealized Cave VR system. Tiled
rear projection stereo images appear on up to six faces of the room in
which the operator works. In practice, most Caves have three to four faces
with projections. (Image from: Cruz-Neira, Sandin, DeFanti, Kenyon and
Hart, 1992).
Figure 5: Schematic showing relationship among the eyes,
hands and display in Cave style VR.
Figure 6: The ImmersaDesk VR System. A large format
rear-projection flat stereo display (Photo: Electronic Visualization
Laboratory at the University of Illinois at Chicago)
Figure 7: The Cubby System: A Small 3-Sided Cave [6, 7]
Figure 8: Chameleon Palm-held VR System. A monocular image
is presented on a palm-sized portable display. The display has position
and orientation tracking so what is displayed is determined by the display
position. (It is like a virtual magnifying glass). The display also
incorporates some manual controls. (Photo from Fitzmaurice, 1993)
Figure 9: Schematic showing relationship among the eyes,
hands and display in Chameleon style VR.
Figure 10: Art+Com virtual car display. This system is
essentially a larger format display version of Chameleon. A
counter-balanced boom constrains the display movement as well as supports
its weight. (Photo: Art+Com).
Figure 11: Art+Com VR Control: Note that the display in
the previous photo is a touch screen that enables the operator to interact
with the image. (Photo: Art+Com).
|
A simple schematic of HMD systems is shown in Figure 2. It
represents the relationship among the eyes, hands and display. First, it
shows that the eyes and display are both tightly physically coupled, and
that their position is tracked. In addition, it shows that the hands are
on the “far” side of the display. Finally, it shows that all three are
physically coupled, and mobile within physical space.
According to these criteria, and for the purposes of this paper,
boom-mounted displays, such as illustrated in Figure 3, are a variation on
HMDs, as opposed to a separate category (in contrast to the analysis of
Cruz-Neira, Sandin, DeFanti, Kenyon and Hart, 1992) [4].
CavesA significantly different approach to VR, called Cave VR,
was introduced by Cruz-Neira, Sandin, DeFanti, Kenyon and Hart in 1992
[4]. In this class of VR, the user functions within a room on which one or
more of the surfaces (walls, floor, ceiling …) is the display. An
idealized representation of a Cave is shown in Figure 4. This shows four
sides of a six-sided Cave. In a Cave, each of the displays is “tiled,” in
that together they provide a seamless omni-directional view of the virtual
scene. Furthermore, the displays are ideally stereo, and the operator
views them through a set of lightweight transparent shutter glasses. The
user’s head position is tracked within the Cave so that what is displayed
preserves proper perspective, etc., in adapting to movements and change of
location of gaze. That is, perceptually, the user sees the virtual scene
in a manner consistent with if it were real. And, as anyone who has seen a
stereo movie knows, the objects in the virtual scene do not just appear on
the Cave walls and beyond. They can appear to enter into the physical
space of the Cave itself, where the user can interact with them
directly.
As with HMD VR, manual interaction within the Cave is typically
accomplished with a HDOF device such as a “flying mouse” (sometimes
coupled with speech recognition), in order to enable the operator to
remain mobile within the space.
One area where Caves differ from HMD VR is that, since the glasses are
transparent, one can see the physical as well as the virtual world.
Consequently, if you and I are both in the space, we can see each other as
well as the virtual world. However, the way that we can share the scene
has some distinct differences from HMD VR. Remember that what is displayed
is determined by head tracking. If we are both in the Cave, we both are
viewing the same displays, preventing us from each having our own “point
of view.” (While we can both look at different things and different
directions, we both do so as if from the perspective of the current
location of the head tracker.) So the good news is, in the Cave we really
are presented with the same view. The bad news is, you have to see it from
my location, or vice versa.
In remote collaboration, where two Caves are linked, this constraint is
softened since each Cave can have a unique view, but everyone within a
single Cave must share the same one. But the advantage of being able to
see each other in the context of the virtual scene is lost when
collaborating across multiple Caves. In remote collaboration one must
resort to the same techniques used in HMD VR — such as the use of avatars
or some other representation — in order to see one’s remote collaborators
within the virtual space.
Finally, there is one potential problem that is unique to same location
collaboration in Caves. In the everyday world, you and I may find
ourselves on opposite sides of an object of interest or discussion. But
what happens in a Cave if the object of interest lies within the confines
of the physical walls of the Cave? If we are facing each other in a Cave
with a virtual object in between us, neither of us will be able to see the
object as we are each blocking the screen on which it is being projected
for the other person. Let us call this the “shadow effect.”
As with HMD VR, in Figure 5 we characterize Cave VR by means of a
simple schematic. Here we illustrate that the eyes and hands are loosely
coupled and mobile, and that the display is anchored in a fixed position.
Furthermore, it shows that the head is tracked and that the hands are
visible and located between the display and the eye.
Fitting into this characterization, are a number of other systems,
which might therefore be considered “degenerate Caves.” One example would
be large format projection displays such as the ImmersaDesk shown in
Figure 6, developed at the Electronic Visualization Laboratory at the
University of Illinois at Chicago (Czernuszenko, Pape, Sandin, DeFanti,
Dawe and Brown, 1997) [5]. This is essentially a small one-sided
Cave.
Another example would be what Ware and Booth [18] called fish tank VR.
These are typically CRT systems which incorporate head-tracking, and
present a perspective view (often not stereo), based on the user’s head
position. Such systems can be thought of as very small format one-sided
Caves (tunnels?) with a consequently limited field of view and range of
mobility of the user.
Actually, small format caves have been built, showing that you don’t
have to be able to walk around in a cave for the technology to be of
value. The Cubby system developed at the Technical University of Delft in
the Netherlands is one such example [6, 7].
Finally, flight and driving simulators, which involve a vehicle in a
space (often only partially) surrounded by rear projection screens, would
also fall into this category. The display is often not stereo, and it may
not be flat. And the user is typically not mobile, being confined to the
vehicle. However, the basic relationship among the view, hands and display
are consistent with the Cave approach.
Chameleon Style VRThe third and least well known approach to VR
that we will discuss was introduced by Fitzmaurice (1993) in his Chameleon
system. This can be thought of as handheld VR. In the Chameleon system,
the image appeared on a small display held in the palm of the hand. In
this case, what appeared on the screen was determined by tracking the
position of the display, rather than the head of the user.
One way to think about the Chameleon approach is as a magnifying glass
that looks onto a virtual scene, rather than the physical world. And while
the display is small, and certainly does not give the wide angle view
found with the Cave approach, the scene is easily browsed by moving the
lightweight display, as shown in the bottom image of Figure 8.
This movement of the display actually takes advantage of a subtle but
powerful effect in human visual perception. With respect to visual
perception, Newton was wrong about the equivalence of relative motion.
That is, moving a scene on a fixed display is not the same as moving a
display over a stationary scene. The reason is rooted in the persistence
of images on the retina, formally known as the “Parks Effect” [13]. Much
like moving the cursor often leaves a visible trail on a screen, moving
the Chameleon display across the field of vision, and updating the view
with the motion, can leave an image of the larger scene on the retina.
Hence, if the display can move, the effective size of the virtual display
need not be the same as the physical size. (If you remain confused, think
about the effect of drawing a pattern on a wall by quickly moving a laser
pointer. Here, a whole pattern is displayed even though only one point is
illuminated at any given time. The image is in your eye, not on the wall.
Such is the human visual perceptual system, and Chameleon-like VR can take
advantage of it.)
Interaction with this class of display tends to be based on devices
such as buttons or (as seen in the next example) a touch screen coupled
directly with the display. That is to say, the display device serves for
both input and output. In no cases, to our knowledge, has stereo display
been used with this class of system, although one can imagine achieving
this using the same kind of shuttered glasses employed in Cave
systems.
Like Cave systems, in Chameleon-like VR, one has an unobstructed view
of people and objects in the physical world. However, unlike the Cave but
as with HMD VR, in collaborating with others in the same physical space,
each user has their own view. And yet, it is easy to have a mechanism for
sharing a view without disorienting the other viewers, since orientation
is mainly determined by one’s orientation in physical space. (Contrast
this with switching views in either Cave or HMD VR.)
On the other hand, Chameleon VR shares the same problem as both HMD and
Cave VR in establishing a sense of presence of others in collaboration
involving different physical locations.
As with the other techniques, we can characterize Chameleon-like VR
schematically. Figure 9 illustrates the tight coupling of the hand(s) with
the display, as well as the tracking of the display, and the mobility
(modulo any tethering) of all three.
There have been other examples that have taken the Chameleon-like
approach. For our purposes, one of the most interesting was developed by
Art+Com (1998) [1] to enable the public to view a virtual version of the
new Daimler-Benz A-class vehicle at the IAA motor show in Frankfurt,
September of 1997. This is illustrated in Figure 10.
In this example, the display was larger than in the original Chameleon
system. Rather than handheld, it was supported by a counter-balanced boom.
While mechanically not unlike the Fakespace boom seen previously in Figure
3, conceptually this system is quite distinct. It very much falls into the
Chameleon-class of VR by virtue of the relationship of the hands to the
display, and the user’s simultaneous visibility and awareness of the
surrounding physical space.
In this example, the system was on a scale to enable the car to be
viewed on a 1:1 scale. The user could walk through and view the virtual
car with the help of a flat screen (LCD) attached to a swivel arm. What
this example demonstrates is how the technology for interacting with the
virtual space can be integrated seamlessly into the display. This is shown
in Figure 11, which illustrates how a touch screen on the display was used
to select things such as the color of the vehicle or fabric of the
upholstery.
Finally, like HMDs, Chameleon-like systems have the ability to support
augmented reality. In his paper, for example, Fitzmaurice [10] showed how
location tracking not only told the device where it was physically, but
also relative to other devices, or people. Brought close to a map, for
example, it could give additional information about the region that it was
close to. Or, brought beside a complex piece of machinery, by being aware
of the fact, it could give valuable information about how to use or repair
the device.
Some researchers, Rekimoto [14], for example, have augmented
Chameleon-like devices further and added video cameras in a manner similar
to those discussed in the section on HMDs. Using this approach, the
computer-generated information can, likewise, be superimposed over a view
of the physical world, with the same benefits discussed with HMDs.
Summary and ConclusionsWe have surveyed three distinct approaches
to VR. We have attempted to describe each in terms of properties that
might influence their suitability for different types of applications. In
particular, we have emphasized properties that emerge in different forms
of collaboration. These are summarized in Table 1.
Obviously, other factors will also affect what technology is adopted.
Cost is always an issue. So is the question of the amount of space, and
any specialized environments required. And even within type, there is a
broad range of variation, in image quality, responsiveness, etc.
But in many cases, it may be that more global human factors are most
important. By way of example, consider an automotive design studio that
wants to use VR technology for design reviews. Cave technology can and has
been used to good effect. However, the quality has to be balanced with the
fact that there typically isn’t a Cave in every studio. Rather, the Cave
is most commonly a shared resource in a different part of the building. It
has to be booked and data transferred and set up. While this structure can
support formal reviews, it does not lend itself to casual or spontaneous
reviews by management, customers or designers. That is to say, social
issues might be the determining factor in choosing something like a
Chameleon VR system, even if the fidelity does not match that of the
alternative approaches.
VR technologies are expensive and not well understood. In our opinion,
there is no “right approach” without a careful analysis of user, task and
context (physical and social). Hopefully, the concepts outlined in this
paper make some progress in paving the path to an understanding of the
issues that will support such decisions. In the meantime, the authors
welcome comments, suggestions and questions.
AcknowledgmentsThe research underlying this paper has been
supported by Alias|Wavefront, Inc. and Silicon Graphics, Inc. This support
is gratefully acknowledged. Also thanks to Thomas Baudel and Michael Mills
for valuable suggestions and help. |