Abstract
There are a various approaches to implementing Virtual Reality (VR) systems. The head mounted display (HMD) and Cave approaches are two of the best known. In this paper, we discuss such approaches from the perspective of the types of interaction that they afford. Our analysis looks at interaction from three perspectives: solo interaction, collaborative interaction in the same physical space, and remote collaboration. From this analysis emerges a basic taxonomy that is intended to help systems designers make choices that better match their implementation with the needs of their application and users.
Immersive Virtual Reality (VR) was first suggested as were so many other things by Ivan Sutherland (1965). Practical working systems have now been with us for over a decade and have been written about extensively (e.g., Rheingold, 1991). If one includes the early work of Krueger (1983), they go back even further. The most well known approach to VR is that of the head mounted display (HMD) coupled with head tracking. With such systems, one typically is presented with a stereo binocular view of the virtual world, often with stereo audio. By virtue of tracking the viewing position (the head) and orientation in the physical world, the view and perspective of the virtual are consistent with what would experience in the physical world from the same actions.
In addition to tracking viewpoint, which is tied to what is displayed to the user, such systems also typically permit some means of input, such as a dataglove (Zimmerman, Lanier, Blanchard, Bryson & Harvill, 1987) or some other high degree of freedom input to support interaction with the displayed virtual world.
As the art progressed, alternative technical approaches to VR have emerged. Of these, we distinguish among three:
Each of these types of VR system is discussed in more detail below. But the point of this paper is not to provide a history or enumeration of VR systems, per se.
VR, while expensive and still relatively new, is a powerful technology. It is being applied in a range of contexts ranging from entertainment to automotive design. But if one is going to engage the technology, then what path to follow, and why? What are the relevant dimensions? What are the pros and cons of each approach?
Providing some vocabulary and a framework to answering such questions is what motivates this brief discussion paper. After introducing each of the three classes of VR system, we discuss them in terms of their ability to support three types of interaction:
These are the key dimensions according to which we contrast the various
approaches. It is obvious that other concerns such as cost, speed, fidelity,
space requirements, etc. affect the choice of which technology to adopt.
We will touch on some of these. But our overall objective is more modest:
to shed some light on those dimensions that we feel we best understand.
In HMD VR, the user mounts a stereo display, much like a pair of glasses that provide a view into the virtual world. The physical form of these "glasses" can range from something on the scale of a motorcycle helmet to a pair of sunglasses. Figure 1 illustrates one example of a HMD.
There is a great variety in display quality. The goal in the technology
is to provide the widest field of view at the highest quality and with the
least weight and at a reasonable cost. The reader is referred to Neale (1998)
for a reasonably up-to-date survey of HMD technology.
Figure 1: Modern Inexpensive HMD: The General Reality CE-200W (Photo: General Reality Corp.)
There is a range of high degree of freedom (HDOF) input devices that can
be used in interaction with such systems. An overall directory of sources
to input devices can be found in Buxton (1998). Furthermore, a number of
classes of HDOF technologies are discussed in the contribution of Shumin
Zhai (1998) in this special issue. Because of the typical mobility of the
user (compared to desktop systems), however, most HMD systems use what Zhai
calls a flying mouse class of device, often in conjunction with a
data-glove type controller. In some cases, each hand is instrumented in
order to support bimanual interaction.
The issue with virtually all HMDs is that the eyes are covered by the display. Consequently, one sees the virtual world at the expense of the physical one. Users cannot directly see their hands nor the devices that they are controlling. Similarly, they can not directly see objects or other people who are in their immediate physical environment. Therefore, in order to function, some representation of such entities from the physical world must appear in the virtual one. In order to use my hands, I most likely must see a representation of them. Likewise, in order to avoid bumping into a table, I must see a representation of it, and to avoid bumping into you, I must see an avatar, or some other representation of you.
In collaborative work a significant observation that emerges from this is that, visually, HMD VR treats those in the same and those in remote physical spaces the same (some would say equally poorly, since visually there is no advantage to "being there" physically).
There is an important caveat to raise at this juncture. Some researchers have found a way around the problem of seeing the physical world (such as objects, their hands, tools or other people) while wearing HMDs. One approach is to mount one or more video cameras onto the HMD and feed the signals to the displays (see Yoo and Olano, 1993; Azuma and Bishop, 1994; and State et al., 1996). The cameras function as surrogate eyes providing a view into the physical world onto which is superimposed a computer generated view of the virtual world. The result is much like a head's up display, and this approach to VR falls into the general category of Augmented Reality (AR), since it enables the computer to augment our view of the physical world with additional information. See Feiner, MacIntyre and Seligmann (1993) for an example of Augmented Reality and its application.
One important application of this technology is in remote collaboration. As an example, take the case of a technician who needs guidance to repair a complex piece of equipment from an expert who is not physically there. Through the cameras mounted on the technician's HMD, the expert can remotely see what the technician is looking at. Conversely, using VR technology, the expert can point and indicate to the technician what to do. The guidance of the expert is superimposed on the technician's view of the equipment in the HMD, thereby enabling the repair to proceed.
Clearly the ability to support AR is an important attribute of HMD VR. However, since it is not in the mainstream of HMD VR, we will not discuss it further.
To compare the three VR approaches, we have defined a simple schematic to represent the relationship among the eyes, hands and display. Figure 2 shows a simple schematic of HMD VR systems. First, it shows that the eyes and display are both tightly coupled, physically and that their position is tracked. In addition, it shows that the hands are on the "far" side of the display. Finally, it shows that all three are physically coupled, and mobile within physical space.
According to these criteria, and for the purposes of this paper, boom-mounted
displays, such as illustrated in Figure 3, are a variation on HMDs, as opposed
to a separate category (in contrast to the analysis of Cruz-Neira,, Sandin,
DeFanti, Kenyon and Hart, 1992).
A significantly different approach to VR, called Cave VR, was
introduced by Cruz-Neira,, Sandin, DeFanti, Kenyon and Hart (1992). In this
class of VR, the user functions within a room on which one or more of the
surfaces (walls, floor, ceiling) is the display. An idealized representation
of a cave is shown in Figure 4. This shows 4 sides of a 6-sided cave. In
a cave, each of the displays is "tiled", in that together they
provide a seamless omnidirectional view of the virtual scene. Furthermore,
the displays are ideally stereo, and the operator views them through a set
of lightweight transparent shutter glasses. The user's head position is
tracked within the cave so that what is displayed preserves proper perspective,
etc., in adapting to movements and change of location of gaze. That is,
perceptually, the user sees the virtual scene in a manner consistent with
it if it were real. And, as anyone who has seen a stereo movie knows, the
objects in the virtual scene do not just appear on the cave walls and beyond.
They can appear to enter into the physical space of the cave itself, where
the user can interact with them directly.
As with HMD VR, manual interaction within the cave is typically accomplished with a HDOF device such as a "flying mouse" (sometimes coupled with speech recognition), in order to enable the operator to remain mobile within the space.
One area where caves differ from HMD VR is that, since the glasses are transparent, one can see the physical as well as the virtual world. Consequently, if you and I are both in the space, we can see each other as well as the virtual world. However, the way that we can share the scene has some distinct differences from HMD VR. Remember that what is displayed is determined by head tracking. If we are both in the cave, we both are viewing the same displays, preventing us from each having our own "point of view." (While we can both look at different things and different directions, we both do so as if from the from perspective of the current location of the head tracker.) So the good news is, in the cave we really are presented with the same view. The bad news is, you have to see it from my location, or vice versa.
In remote collaboration, where two caves are linked, this constraint is softened since each cave can have a unique view, but everyone within a single cave must share the same one. But the advantage of being able to see each other in the context of the virtual scene is lost when collaborating across multiple caves. In remote collaboration one must resort to the same techniques used in HMD VR,such as the use of avatars or some other representation, in order to see one's remote collaborators within the virtual space.
Finally, there is one potential problem that is unique to same-location
collaboration in caves. In the everyday world, you and I may find ourselves
on opposite sides of an object of interest or discussion. But what happens
in a cave if the object of interest lies within the confines of the physical
walls of the cave? If we are facing each other in a cave with a virtual
object in between us, neither of us will be able to see the object as we
are each blocking the screen on which it is being projected for the other
person. Let us call this the "shadow effect."
As with HMD VR, in Figure 5 we characterize cave VR by means of a simple schematic. Here we illustrate that the eyes and hands are loosely coupled and mobile, and that the display is anchored in a fixed position. Furthermore, it shows that the head is tracked and that the hands are visible, and are between the display and the eye.
Fitting into this characterization, are a number of other systems, which
might therefore be considered "degenerate caves." One example
would be large format projection displays such as the ImmersaDesk
shown in Figure 6, developed at the Electronic Visualization Laboratory
at the University of Illinois at Chicago (Czernuszenko, Pape, Sandin, DeFanti,
Dawe and Brown, 1997). This is essentially a small 1-sided cave.
Another example would be what Ware and Booth (1993) called Fish tank VR. These are typically CRT systems which incorporate head-tacking, and present a perspective view (often not stereo), based on the user's head position. Such systems can be thought of as very small format one-sided caves (tunnels?) with a consequently limited field of view and range of mobility of the user.
Actually, small format caves have been built, showing that you don't have to be able to walk around in a cave for the technology to be of value. The Cubby system developed at the Technical University of Delft in The Netherlands is one such example (Djajadiningrat,1998; Djajadiningrat, Smets & Overbeeke, 1997).
Figure 7: The Cubby System: A Small 3-Sided Cave (Djajadiningrat,1998; Djajadiningrat, Smets & Overbeeke, 1997).
Finally, flight and driving simulators, which involve a vehicle in a
space (often only partially) surrounded by rear projection screens, would
also fall into this category. The display is often not stereo, and it may
not be flat. And the user is typically not mobile, being confined to the
vehicle. However, the basic relationship among the view, hands and display
are consistent with the cave approach.
The third and least well known approach to VR that we will discuss was introduced by Fitzmaurice (1993) in his Chameleon system. This can be thought of as hand-held VR. In the Chameleon system, the image appeared on a small display held in the palm of the hand. In this case, what appeared on the screen was determined by tracking the position of the display, rather than the head of the user.
One way to think about the Chameleon approach is as a magnifying glass that looks onto a virtual scene, rather than the physical world. And while the display is small, and certainly does not give the wide angle view found with the cave approach, the scene is easily browsed by moving the lightweight display, as shown in the right hand image of Figure 8.
This movement of the display actually takes advantage of a subtle but powerful effect in human visual perception. With respect to visual perception, Newton was wrong about the equivalence of relative motion. That is, moving a scene on a fixed display is not the same as moving a display over a stationary scene. The reason is rooted in the persistence of images on the retina, formally known as the "Parks Effect," (Parks, 1965). Much like moving the cursor often leaves a visible trail on a screen, moving the Chameleon display across the field of vision, and updating the view with the motion, can leave an image of the larger scene on the retina. Hence, if the display can move, the effective size of the virtual display need not be the same as the physical size. (If you remain confused, think about the effect of drawing a pattern on a wall by quickly moving a laser pointer. Here, a whole pattern is displayed even though only one point is illuminated at any given time. The image is in your eye, not on the wall. Such is the human visual perceptual system, and Chameleon-like VR can take advantage of it.)
Interaction with this class of display tends to be based on devices such as buttons or (as seen in the next example) a touch screen coupled directly with the display. That is to say, the display device serves for both input and output. In no cases, to our knowledge, has stereo display been used with this class of system, although one can imagine achieving this using the same kind of shuttered glasses employed in cave systems.
Like cave systems, in Chameleon-like VR, one has an unobstructed view of people and objects in the physical world. However, unlike the cave but as with HMD VR, in collaborating with others in the same physical space, each user has their own view. And yet, it is easy to have a mechanism for sharing a view without disorienting the other viewers, since orientation is mainly determined by one's orientation in physical space. (Contrast this with switching views in either cave or HMD VR.)
On the other hand, Chameleon VR shares the same problem as both HMD and cave VR in establishing a sense of presence of others in collaboration involving different physical locations.
As with the other techniques, we can characterize Chameleon-like VR schematically.
Figure 9 illustrates the tight coupling of the hand(s) with the display,
as well as the tracking of the display, and the mobility (modulo any tethering)
of all three.
There have been other examples that have taken the Chameleon-like approach.
For our purposes, one of the most interesting was developed by Art+Com (1998)
to enable the public to view a virtual version of the new Daimler-Benz A-class
vehicle at the IAA motor show in Frankfurt, September of 1997. This is illustrated
in Figure 10.
In this example, the display was larger than in the original Chameleon system. Rather than hand-held, it was supported by a counter-balanced boom. While mechanically not unlike the Fakespace boom seen previously in Figure 3, conceptually this system is quite distinct. It very much falls into the Chameleon-class of VR by virtue of the relationship of the hands to the display, and the user's simultaneous visibility and awareness of the surrounding physical space.
In this example, the system was on a scale to enable the car to be viewed
on a 1:1 scale. The user could walk through and view the virtual car with
the help of a flat screen (LCD) attached to a swivel arm. What this example
demonstrates is how the technology for interacting with the virtual space
can be integrated seamlessly into the display. This is shown in Figure 11,
which illustrates how a touch screen on the display was used to select things
such as the colour of the vehicle or fabric of the upholstery.
Finally, like HMDs, Chameleon-like systems have the ability to support augmented reality. In his paper, for example, Fitzmaurice (1993) showed how location tracking not only told the device where it was physically, but also relative to other devices, or people. Brought close to a map, for example, it could give additional information about the region that it was close to. Or, brought beside a complex piece of machinery, by being aware of the fact, it could give valuable information about how to use or repair the device.
Some researchers, Rekimoto and Nagao (1995), for example, have augmented
Chameleon-like devices further and added video cameras in a manner similar
to those discussed in the section on HMDs. Using this approach, the computer
generated information can, likewise, be superimposed over a view of the
physical world, with the same benefits discussed with HMDs.
In the preceding we have surveyed three distinct approaches to VR. We
have attempted to describe each in terms of properties that might influence
their suitability for different types of applications. In particular, we
have emphasized properties that emerge in different forms of collaboration.
These are summarized in Table 1.
|
||||
HMD |
|
|||
Cave |
|
|||
Chameleon |
|
Obviously, other factors will also affect what technology is adopted. Cost is always an issue. So is the question of the amount of space, and any specialized environments required. And even within type, there is a broad range of variation, in image quality, responsiveness, etc.
But in many cases, it may be that more global human factors are most important. By way of example, consider an automotive design studio that wants to use VR technology for design reviews. Cave technology can and has been used to good effect. However, the quality has to be balanced with the fact that there typically isn't a cave in every studio. Rather, the cave is most commonly a shared resource in a different part of the building. It has to be booked and data transferred and set up. While this structure can support formal reviews, it does not lend itself to casual or spontaneous reviews by management, customers or designers. That is to say, social issues might be the determining factor in choosing something like a Chameleon VR system, even if the fidelity does not match that of the alternative approaches.
VR technologies are expensive and not well understood. In our opinion,
there is no "right approach" without a careful analysis of user,
task and context (physical and social). Hopefully, the concepts outlined
in this paper make some progress in paving the path to an understanding
of the issues that will support such decisions. In the meantime, the authors
welcome comments, suggestions and questions.
The research underlying this paper has been supported by Alias|Wavefront, Inc. and Silicon Graphics, Inc. This support is gratefully acknowledged. Also thanks for Thomas Baudel and Michael Mills for valuable suggestions and help.
ART+COM (1998). http://www.artcom.de/projects/vrf/welcome.en
Azuma, R. and Bishop. G. (1994). Improving static and dynamic registration in an optical see-through HMD. Proceedings of SIGGRAPH'94, 197-204.
Buxton, W. (1998). A Directory of Sources to Input Technologies. http://www.dgp.utoronto.ca/people/BillBuxton/InputSources.html
Cruz-Neira,C., Sandin, D.J., DeFanti, T.A., Kenyon, R.V., and Hart, J.C. (1992). The CAVE: Audio Visual Experience Automatic Virtual Environment, Communications of the ACM, 35(6), 65-72.
Czernuszenko, M., Pape, D., Sandin, D., DeFanti, T., Dawe, G. L., and Brown, M. D. (1997). The ImmersaDesk and Infinity Wall Projection-Based Virtual Reality Displays. Computer Graphics , 31(2), 46-49.
Djajadiningrat, J.P. (1998). Cubby: Whay You See is Where You Act, PhD Thesis, Technical University of Delft, The Netherlands. http://www.io.tudelft.nl/research/IDEATE/cubby/cubby.html
Djajadiningrat, J.P., Smets, G.J.F. & OVerbeeke, C.J. (1997). Cubby: a multiscreen movement parallax display for direct manual manipulation, Displays 17, 191-197.
Fakespace, Inc., 241 Polaris Ave. Mountain View, CA 94043 USA. http://www.fakespace.com/
Feiner, S., MacIntyre, B. & Seligmann, D. (1993). Knowledge-Based Augmented Reality. Communications of the ACM, 36(7), 53-62.
Fitzmaurice, G.W. (1993). Situated Information Spaces and Spatially Aware Palmtop Computers. Communications of the ACM, 36(7), 38-49.
Krueger, Myron, W. (1983). Artificial Reality. Reading: Addison-Wesley.
Neale, D. (1998). Head-Mounted Displays: Product Reviews and Related Design Considerations, Hypermedia Technical Report HCIL-98-02, Human-Computer Interaction Laboratory, Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061-0118. http://hci.ise.vt.edu/~hcil/htr/HCIL-98-02/HCIL-98-02.html
Parks, T.E. (1965). Post Retinal Visual Storage, American Journal of Psychology, 78, 145-147.
Rekimoto, J. and Nagao, K. (1995). The world through the computer: computer augmented interaction with real world environments. Proceedings of UIST'95, 29-36.
Rheingold, H. (1991). Virtual Reality. N.Y.: Summit.
State, A., Hirota, G., Chen, D.T., Garrett, W.F. and Livingston, M.A. (1996). Superior augmented reality registration by integrating landmark tracking and magnetic tracking. Proceedings of SIGGRAPH'96, 429-438.
Sutherland, I. (1965). The Ultimate Display. Proceedings of IFIP 65, Vol. 2, 506-508, 582-583.
Ware, C., Arthur, K. & Booth, K. (1993). Fish tank virtual reality, Proceedings of InterCHI '93, 37-42.
Yoo, T.S., Olano, T.M. (1993). Instant Hole (Windows onto Reality). University of North Carolina at Chapel Hill Technical Report TR-93-027. http://www.cs.unc.edu/Research/graphics/pubs.html
Zhai, S. (1998). User Performance in Relation to 3D Input Devices. To appear, Computer Graphics Quarterly, November 1998.
Zimmerman, T.G., Lanier, J., Blanchard, C., Bryson, S. & Harvill, Y. (1987). A Hand Gesture Interface Device, Proceedings of CHI+GI '87, 189-192.