Approaches to VR and Interactive Affordances: A Taxonomy

Buxton, W. & Fitzmaurice, G.W.(1998). HMD's, Caves & Chameleon: A Human-Centric Analysis of Interaction in Virtual Space, Computer Graphics, The SIGGRAPH Quarterly, 32(4), 64-68.

HMD's, Caves & Chameleon: A Human-Centric
Analysis of Interaction in Virtual Space

Bill Buxton & George W. Fitzmaurice
Alias|Wavefront Inc.,
Toronto, Ontario
{buxton, gf}@aw.sgi.com

Abstract

There are a various approaches to implementing Virtual Reality (VR) systems. The head mounted display (HMD) and Cave approaches are two of the best known. In this paper, we discuss such approaches from the perspective of the types of interaction that they afford. Our analysis looks at interaction from three perspectives: solo interaction, collaborative interaction in the same physical space, and remote collaboration. From this analysis emerges a basic taxonomy that is intended to help systems designers make choices that better match their implementation with the needs of their application and users.

Introduction

Immersive Virtual Reality (VR) was first suggested – as were so many other things – by Ivan Sutherland (1965). Practical working systems have now been with us for over a decade and have been written about extensively (e.g., Rheingold, 1991). If one includes the early work of Krueger (1983), they go back even further. The most well known approach to VR is that of the head mounted display (HMD) coupled with head tracking. With such systems, one typically is presented with a stereo binocular view of the virtual world, often with stereo audio. By virtue of tracking the viewing position (the head) and orientation in the physical world, the view and perspective of the virtual are consistent with what would experience in the physical world from the same actions.

In addition to tracking viewpoint, which is tied to what is displayed to the user, such systems also typically permit some means of input, such as a dataglove (Zimmerman, Lanier, Blanchard, Bryson & Harvill, 1987) or some other high degree of freedom input to support interaction with the displayed virtual world.

As the art progressed, alternative technical approaches to VR have emerged. Of these, we distinguish among three:

Head-Mounted VR: systems as described briefly above, where one typically has a head-mounted wide-view stereo display coupled with head tracking, and some other means of input to support interaction.
Cave-based VR: where some or all of the walls of a room are rear-projection stereo displays. The user wears glasses to enable viewing the stereo images, and there is a head-tracking mechanism to control what is projected (i.e., the view) depending on where the viewer is located and looking, as well as some mechanism for interacting with what is seen.
Chameleon-type VR: which involves a hand held, or hand moved, display whose position and orientation are tracked in order to determine what appears on it. Furthermore, the display enables interacting with what appears on it.

Each of these types of VR system is discussed in more detail below. But the point of this paper is not to provide a history or enumeration of VR systems, per se.

VR, while expensive and still relatively new, is a powerful technology. It is being applied in a range of contexts ranging from entertainment to automotive design. But if one is going to engage the technology, then what path to follow, and why? What are the relevant dimensions? What are the pros and cons of each approach?

Providing some vocabulary and a framework to answering such questions is what motivates this brief discussion paper. After introducing each of the three classes of VR system, we discuss them in terms of their ability to support three types of interaction:

Solo: where there is only one person interacting in the virtual space.
Same Place Collaboration: where there is more than one user interacting in the virtual space, but they are physically situated in the same location.
Different Place Collaboration: where there is more than one user interacting in the virtual space, but they are situated in different physical locations.

These are the key dimensions according to which we contrast the various approaches. It is obvious that other concerns such as cost, speed, fidelity, space requirements, etc. affect the choice of which technology to adopt. We will touch on some of these. But our overall objective is more modest: to shed some light on those dimensions that we feel we best understand.

Head-Mounted Display (HMD) VR

In HMD VR, the user mounts a stereo display, much like a pair of glasses that provide a view into the virtual world. The physical form of these "glasses" can range from something on the scale of a motorcycle helmet to a pair of sunglasses. Figure 1 illustrates one example of a HMD.

There is a great variety in display quality. The goal in the technology is to provide the widest field of view at the highest quality and with the least weight and at a reasonable cost. The reader is referred to Neale (1998) for a reasonably up-to-date survey of HMD technology.

Figure 1: Modern Inexpensive HMD: The General Reality CE-200W (Photo: General Reality Corp.)

There is a range of high degree of freedom (HDOF) input devices that can be used in interaction with such systems. An overall directory of sources to input devices can be found in Buxton (1998). Furthermore, a number of classes of HDOF technologies are discussed in the contribution of Shumin Zhai (1998) in this special issue. Because of the typical mobility of the user (compared to desktop systems), however, most HMD systems use what Zhai calls a flying mouse class of device, often in conjunction with a data-glove type controller. In some cases, each hand is instrumented in order to support bimanual interaction.

The issue with virtually all HMDs is that the eyes are covered by the display. Consequently, one sees the virtual world at the expense of the physical one. Users cannot directly see their hands nor the devices that they are controlling. Similarly, they can not directly see objects or other people who are in their immediate physical environment. Therefore, in order to function, some representation of such entities from the physical world must appear in the virtual one. In order to use my hands, I most likely must see a representation of them. Likewise, in order to avoid bumping into a table, I must see a representation of it, and to avoid bumping into you, I must see an avatar, or some other representation of you.

In collaborative work a significant observation that emerges from this is that, visually, HMD VR treats those in the same and those in remote physical spaces the same (some would say equally poorly, since visually there is no advantage to "being there" physically).

There is an important caveat to raise at this juncture. Some researchers have found a way around the problem of seeing the physical world (such as objects, their hands, tools or other people) while wearing HMDs. One approach is to mount one or more video cameras onto the HMD and feed the signals to the displays (see Yoo and Olano, 1993; Azuma and Bishop, 1994; and State et al., 1996). The cameras function as surrogate eyes providing a view into the physical world onto which is superimposed a computer generated view of the virtual world. The result is much like a head's up display, and this approach to VR falls into the general category of Augmented Reality (AR), since it enables the computer to augment our view of the physical world with additional information. See Feiner, MacIntyre and Seligmann (1993) for an example of Augmented Reality and its application.

One important application of this technology is in remote collaboration. As an example, take the case of a technician who needs guidance to repair a complex piece of equipment from an expert who is not physically there. Through the cameras mounted on the technician's HMD, the expert can remotely see what the technician is looking at. Conversely, using VR technology, the expert can point and indicate to the technician what to do. The guidance of the expert is superimposed on the technician's view of the equipment in the HMD, thereby enabling the repair to proceed.

Clearly the ability to support AR is an important attribute of HMD VR. However, since it is not in the mainstream of HMD VR, we will not discuss it further.

To compare the three VR approaches, we have defined a simple schematic to represent the relationship among the eyes, hands and display. Figure 2 shows a simple schematic of HMD VR systems. First, it shows that the eyes and display are both tightly coupled, physically and that their position is tracked. In addition, it shows that the hands are on the "far" side of the display. Finally, it shows that all three are physically coupled, and mobile within physical space.

Figure 2: Schematic showing the relationship among the eyes, hands and display in HMD Style VR.

According to these criteria, and for the purposes of this paper, boom-mounted displays, such as illustrated in Figure 3, are a variation on HMDs, as opposed to a separate category (in contrast to the analysis of Cruz-Neira,, Sandin, DeFanti, Kenyon and Hart, 1992).

Figure 3: Fakespace BOOM3C boom mounted display (Photo: Fakespace, Inc.)

CAVES

A significantly different approach to VR, called Cave VR, was introduced by Cruz-Neira,, Sandin, DeFanti, Kenyon and Hart (1992). In this class of VR, the user functions within a room on which one or more of the surfaces (walls, floor, ceiling) is the display. An idealized representation of a cave is shown in Figure 4. This shows 4 sides of a 6-sided cave. In a cave, each of the displays is "tiled", in that together they provide a seamless omnidirectional view of the virtual scene. Furthermore, the displays are ideally stereo, and the operator views them through a set of lightweight transparent shutter glasses. The user's head position is tracked within the cave so that what is displayed preserves proper perspective, etc., in adapting to movements and change of location of gaze. That is, perceptually, the user sees the virtual scene in a manner consistent with it if it were real. And, as anyone who has seen a stereo movie knows, the objects in the virtual scene do not just appear on the cave walls and beyond. They can appear to enter into the physical space of the cave itself, where the user can interact with them directly.

Figure 4: Schematic of an Idealized Cave VR System. Tiled rear projection stereo images appear on up to 6 faces of the room in which the operator works. In practice, most caves have 3-4 faces with projections. (Image from: Cruz-Neira,, Sandin, DeFanti, Kenyon and Hart, 1992).

As with HMD VR, manual interaction within the cave is typically accomplished with a HDOF device such as a "flying mouse" (sometimes coupled with speech recognition), in order to enable the operator to remain mobile within the space.

One area where caves differ from HMD VR is that, since the glasses are transparent, one can see the physical as well as the virtual world. Consequently, if you and I are both in the space, we can see each other as well as the virtual world. However, the way that we can share the scene has some distinct differences from HMD VR. Remember that what is displayed is determined by head tracking. If we are both in the cave, we both are viewing the same displays, preventing us from each having our own "point of view." (While we can both look at different things and different directions, we both do so as if from the from perspective of the current location of the head tracker.) So the good news is, in the cave we really are presented with the same view. The bad news is, you have to see it from my location, or vice versa.

In remote collaboration, where two caves are linked, this constraint is softened since each cave can have a unique view, but everyone within a single cave must share the same one. But the advantage of being able to see each other in the context of the virtual scene is lost when collaborating across multiple caves. In remote collaboration one must resort to the same techniques used in HMD VR,–such as the use of avatars or some other representation, in order to see one's remote collaborators within the virtual space.

Finally, there is one potential problem that is unique to same-location collaboration in caves. In the everyday world, you and I may find ourselves on opposite sides of an object of interest or discussion. But what happens in a cave if the object of interest lies within the confines of the physical walls of the cave? If we are facing each other in a cave with a virtual object in between us, neither of us will be able to see the object as we are each blocking the screen on which it is being projected for the other person. Let us call this the "shadow effect."

Figure 5: Schematic showing relationship among the eyes, hands and display in Cave Style VR.

As with HMD VR, in Figure 5 we characterize cave VR by means of a simple schematic. Here we illustrate that the eyes and hands are loosely coupled and mobile, and that the display is anchored in a fixed position. Furthermore, it shows that the head is tracked and that the hands are visible, and are between the display and the eye.

Fitting into this characterization, are a number of other systems, which might therefore be considered "degenerate caves." One example would be large format projection displays such as the ImmersaDesk shown in Figure 6, developed at the Electronic Visualization Laboratory at the University of Illinois at Chicago (Czernuszenko, Pape, Sandin, DeFanti, Dawe and Brown, 1997). This is essentially a small 1-sided cave.

Figure 6: The ImmersaDesk VR System. A Large format rear-projection flat stereo display (Photo: Electronic Visualization Laboratory at the University of Illinois at Chicago)

Another example would be what Ware and Booth (1993) called Fish tank VR. These are typically CRT systems which incorporate head-tacking, and present a perspective view (often not stereo), based on the user's head position. Such systems can be thought of as very small format one-sided caves (tunnels?) with a consequently limited field of view and range of mobility of the user.

Actually, small format caves have been built, showing that you don't have to be able to walk around in a cave for the technology to be of value. The Cubby system developed at the Technical University of Delft in The Netherlands is one such example (Djajadiningrat,1998; Djajadiningrat, Smets & Overbeeke, 1997).

Figure 7: The Cubby System: A Small 3-Sided Cave (Djajadiningrat,1998; Djajadiningrat, Smets & Overbeeke, 1997).

Finally, flight and driving simulators, which involve a vehicle in a space (often only partially) surrounded by rear projection screens, would also fall into this category. The display is often not stereo, and it may not be flat. And the user is typically not mobile, being confined to the vehicle. However, the basic relationship among the view, hands and display are consistent with the cave approach.

CHAMELEON-Style VR

The third and least well known approach to VR that we will discuss was introduced by Fitzmaurice (1993) in his Chameleon system. This can be thought of as hand-held VR. In the Chameleon system, the image appeared on a small display held in the palm of the hand. In this case, what appeared on the screen was determined by tracking the position of the display, rather than the head of the user.

One way to think about the Chameleon approach is as a magnifying glass that looks onto a virtual scene, rather than the physical world. And while the display is small, and certainly does not give the wide angle view found with the cave approach, the scene is easily browsed by moving the lightweight display, as shown in the right hand image of Figure 8.

This movement of the display actually takes advantage of a subtle but powerful effect in human visual perception. With respect to visual perception, Newton was wrong about the equivalence of relative motion. That is, moving a scene on a fixed display is not the same as moving a display over a stationary scene. The reason is rooted in the persistence of images on the retina, formally known as the "Parks Effect," (Parks, 1965). Much like moving the cursor often leaves a visible trail on a screen, moving the Chameleon display across the field of vision, and updating the view with the motion, can leave an image of the larger scene on the retina. Hence, if the display can move, the effective size of the virtual display need not be the same as the physical size. (If you remain confused, think about the effect of drawing a pattern on a wall by quickly moving a laser pointer. Here, a whole pattern is displayed even though only one point is illuminated at any given time. The image is in your eye, not on the wall. Such is the human visual perceptual system, and Chameleon-like VR can take advantage of it.)

Figure 8: Chameleon Palm-held VR System. A monocular image is presented on a palm sized portable display. The display has position and orientation tracking so what is displayed is determined by the display position. (It is like a virtual magnifying glass). The display also incorporates some manual controls. (Photo from Fitzmaurice, 1993)

Interaction with this class of display tends to be based on devices such as buttons or (as seen in the next example) a touch screen coupled directly with the display. That is to say, the display device serves for both input and output. In no cases, to our knowledge, has stereo display been used with this class of system, although one can imagine achieving this using the same kind of shuttered glasses employed in cave systems.

Like cave systems, in Chameleon-like VR, one has an unobstructed view of people and objects in the physical world. However, unlike the cave but as with HMD VR, in collaborating with others in the same physical space, each user has their own view. And yet, it is easy to have a mechanism for sharing a view without disorienting the other viewers, since orientation is mainly determined by one's orientation in physical space. (Contrast this with switching views in either cave or HMD VR.)

On the other hand, Chameleon VR shares the same problem as both HMD and cave VR in establishing a sense of presence of others in collaboration involving different physical locations.

As with the other techniques, we can characterize Chameleon-like VR schematically. Figure 9 illustrates the tight coupling of the hand(s) with the display, as well as the tracking of the display, and the mobility (modulo any tethering) of all three.

Figure 9: Schematic showing relationship among the eyes, hands and display in Chameleon Style VR.

There have been other examples that have taken the Chameleon-like approach. For our purposes, one of the most interesting was developed by Art+Com (1998) to enable the public to view a virtual version of the new Daimler-Benz A-class vehicle at the IAA motor show in Frankfurt, September of 1997. This is illustrated in Figure 10.

Figure 10: Art+Com Virtual Car Display. This system is essentially a larger format display version of Chameleon. A Counter-balanced boom constrains the display movement as well as supports its weight. (Photo: Art+Com)

In this example, the display was larger than in the original Chameleon system. Rather than hand-held, it was supported by a counter-balanced boom. While mechanically not unlike the Fakespace boom seen previously in Figure 3, conceptually this system is quite distinct. It very much falls into the Chameleon-class of VR by virtue of the relationship of the hands to the display, and the user's simultaneous visibility and awareness of the surrounding physical space.

In this example, the system was on a scale to enable the car to be viewed on a 1:1 scale. The user could walk through and view the virtual car with the help of a flat screen (LCD) attached to a swivel arm. What this example demonstrates is how the technology for interacting with the virtual space can be integrated seamlessly into the display. This is shown in Figure 11, which illustrates how a touch screen on the display was used to select things such as the colour of the vehicle or fabric of the upholstery.

Figure 11: Art+Com VR Control: Note that the display in the previous photo is a touch screen that enables the operator to interact with the image. (Photo: Art+Com)

Finally, like HMDs, Chameleon-like systems have the ability to support augmented reality. In his paper, for example, Fitzmaurice (1993) showed how location tracking not only told the device where it was physically, but also relative to other devices, or people. Brought close to a map, for example, it could give additional information about the region that it was close to. Or, brought beside a complex piece of machinery, by being aware of the fact, it could give valuable information about how to use or repair the device.

Some researchers, Rekimoto and Nagao (1995), for example, have augmented Chameleon-like devices further and added video cameras in a manner similar to those discussed in the section on HMDs. Using this approach, the computer generated information can, likewise, be superimposed over a view of the physical world, with the same benefits discussed with HMDs.

SUMMARY & CONCLUSIONS

In the preceding we have surveyed three distinct approaches to VR. We have attempted to describe each in terms of properties that might influence their suitability for different types of applications. In particular, we have emphasized properties that emerge in different forms of collaboration. These are summarized in Table 1.

Solo
Same-Place

Collaboration
Different-Place

Collaboration
Support

AR?

HMD

see virtual space only
hands and tools by virtual representation only (but see support for AR column)

see from personal viewpoint

awkward shared viewpoint

only see others as avatar, for example (but see support for AR column)

see from personal viewpoint

awkward shared viewpoint

same place and different place collaborators treated the same, as avatars

yes, if HMD coupled with video camera(s) for example. Then local objects, hands & people visible.

Cave

see virtual and physical space
hands and tools visible

see from viewpoint of another (but possibly different view direction)

see others in physical space

potential shadow effect blocking view of object of interest

only one viewpoint per site

only see remote participants as avatars, for example

no

Chameleon

see virtual and physical space
hands and tools visible

see from personal viewpoint

potential non-disruptive shared viewing

see others in physical space

see from personal viewpoint only see remote participants as avatars, for example

yes, with or without video camera to augment display

Table 1: Properties of VR Systems for Various Numbers and Distribution of Users

Obviously, other factors will also affect what technology is adopted. Cost is always an issue. So is the question of the amount of space, and any specialized environments required. And even within type, there is a broad range of variation, in image quality, responsiveness, etc.

But in many cases, it may be that more global human factors are most important. By way of example, consider an automotive design studio that wants to use VR technology for design reviews. Cave technology can and has been used to good effect. However, the quality has to be balanced with the fact that there typically isn't a cave in every studio. Rather, the cave is most commonly a shared resource in a different part of the building. It has to be booked and data transferred and set up. While this structure can support formal reviews, it does not lend itself to casual or spontaneous reviews by management, customers or designers. That is to say, social issues might be the determining factor in choosing something like a Chameleon VR system, even if the fidelity does not match that of the alternative approaches.

VR technologies are expensive and not well understood. In our opinion, there is no "right approach" without a careful analysis of user, task and context (physical and social). Hopefully, the concepts outlined in this paper make some progress in paving the path to an understanding of the issues that will support such decisions. In the meantime, the authors welcome comments, suggestions and questions.

ACKNOWLEDGEMENTS

The research underlying this paper has been supported by Alias|Wavefront, Inc. and Silicon Graphics, Inc. This support is gratefully acknowledged. Also thanks for Thomas Baudel and Michael Mills for valuable suggestions and help.

REFERENCES

ART+COM (1998). http://www.artcom.de/projects/vrf/welcome.en

Azuma, R. and Bishop. G. (1994). Improving static and dynamic registration in an optical see-through HMD. Proceedings of SIGGRAPH'94, 197-204.

Buxton, W. (1998). A Directory of Sources to Input Technologies. http://www.dgp.utoronto.ca/people/BillBuxton/InputSources.html

Cruz-Neira,C., Sandin, D.J., DeFanti, T.A., Kenyon, R.V., and Hart, J.C. (1992). The CAVE: Audio Visual Experience Automatic Virtual Environment, Communications of the ACM, 35(6), 65-72.

Czernuszenko, M., Pape, D., Sandin, D., DeFanti, T., Dawe, G. L., and Brown, M. D. (1997). The ImmersaDesk and Infinity Wall Projection-Based Virtual Reality Displays. Computer Graphics , 31(2), 46-49.

Djajadiningrat, J.P. (1998). Cubby: Whay You See is Where You Act, PhD Thesis, Technical University of Delft, The Netherlands. http://www.io.tudelft.nl/research/IDEATE/cubby/cubby.html

Djajadiningrat, J.P., Smets, G.J.F. & OVerbeeke, C.J. (1997). Cubby: a multiscreen movement parallax display for direct manual manipulation, Displays 17, 191-197.

Fakespace, Inc., 241 Polaris Ave. Mountain View, CA 94043 USA. http://www.fakespace.com/

Feiner, S., MacIntyre, B. & Seligmann, D. (1993). Knowledge-Based Augmented Reality. Communications of the ACM, 36(7), 53-62.

Fitzmaurice, G.W. (1993). Situated Information Spaces and Spatially Aware Palmtop Computers. Communications of the ACM, 36(7), 38-49.

Krueger, Myron, W. (1983). Artificial Reality. Reading: Addison-Wesley.

Neale, D. (1998). Head-Mounted Displays: Product Reviews and Related Design Considerations, Hypermedia Technical Report HCIL-98-02, Human-Computer Interaction Laboratory, Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061-0118. http://hci.ise.vt.edu/~hcil/htr/HCIL-98-02/HCIL-98-02.html

Parks, T.E. (1965). Post Retinal Visual Storage, American Journal of Psychology, 78, 145-147.

Rekimoto, J. and Nagao, K. (1995). The world through the computer: computer augmented interaction with real world environments. Proceedings of UIST'95, 29-36.

Rheingold, H. (1991). Virtual Reality. N.Y.: Summit.

State, A., Hirota, G., Chen, D.T., Garrett, W.F. and Livingston, M.A. (1996). Superior augmented reality registration by integrating landmark tracking and magnetic tracking. Proceedings of SIGGRAPH'96, 429-438.

Sutherland, I. (1965). The Ultimate Display. Proceedings of IFIP 65, Vol. 2, 506-508, 582-583.

Ware, C., Arthur, K. & Booth, K. (1993). Fish tank virtual reality, Proceedings of InterCHI '93, 37-42.

Yoo, T.S., Olano, T.M. (1993). Instant Hole (Windows onto Reality). University of North Carolina at Chapel Hill Technical Report TR-93-027. http://www.cs.unc.edu/Research/graphics/pubs.html

Zhai, S. (1998). User Performance in Relation to 3D Input Devices. To appear, Computer Graphics Quarterly, November 1998.

Zimmerman, T.G., Lanier, J., Blanchard, C., Bryson, S. & Harvill, Y. (1987). A Hand Gesture Interface Device, Proceedings of CHI+GI '87, 189-192.

	Solo	Same-Place Collaboration	Different-Place Collaboration	Support AR?
HMD	see virtual space only hands and tools by virtual representation only (but see support for AR column)	see from personal viewpoint awkward shared viewpoint only see others as avatar, for example (but see support for AR column)	see from personal viewpoint awkward shared viewpoint same place and different place collaborators treated the same, as avatars	yes, if HMD coupled with video camera(s) for example. Then local objects, hands & people visible.
Cave	see virtual and physical space hands and tools visible	see from viewpoint of another (but possibly different view direction) see others in physical space potential shadow effect blocking view of object of interest	only one viewpoint per site only see remote participants as avatars, for example	no
Chameleon	see virtual and physical space hands and tools visible	see from personal viewpoint potential non-disruptive shared viewing see others in physical space	see from personal viewpoint only see remote participants as avatars, for example	yes, with or without video camera to augment display