Chapter 7

Touch, Gesture & Marking

Introduction

In this chapter, we investigate input to computer systems . In our view, this is one of the areas of interaction in which there is the greatest potential for improvement. Whereas a great deal of work has gone into graphical displays over the past ten years, little has changed on the input side. This is unfortunate, and the biggest loser is the end user.

Our discussion will focus on what we call haptic input. This term haptic comes from a Greek word having to do with contact. Hence, haptic input is input that involves physical contact between the computer and the user. This contact may be via the hands using a mouse, feet using a pedal, or even the tongue using a special joystick.

There are non-haptic means of input, however. The main one, speech, is covered separately in the chapter which follows. We will not cover the other, eye-tracking, since it is more of a future technology than something that can be used in main-stream applications today. Those interested in this topic are referred to Jacob (1991) and, Sparrell and Thorosson (1993). A video demonstrating the latter system is provided in Thorisson, Koons, and Bolt (1992).

We can't discuss haptic input without at least mentioning output as well. Every haptic input device can also be considered to provide output. This would be through the tactile or kinesthetic feedback that it provides to the user. In practice, the quality and appropriateness of this "feel" is often extremely important in determining its effectiveness and acceptance in a particular context. Some devices actually provide force feedback, as with some special joysticks. Others use the haptic channel solely for output. An example of this is device producing output in Braille.

What we are not going to do in this chapter is provide a catalogue of input devices and how they work. First, this information is available elsewhere. Sherr (1988), for example, provides an excellent review of input technologies. A more condensed discussion can be found in Foley, van Dam, Feiner and Hughes (1990). Second, and more importantly, we feel that the focus of our discussion should be more on the human and use than on the mechanics of the technology. Hence, a large part of our discussion will deal with sample tasks, by which devices can be compared. In addition, we will look at taxonomies of input that help differentiate devices along dimensions meaningful to the designer.

Finally, the reader is directed to the list of videotape examples found after the bibliography. These are all published as part of the SIGGRAPH Video Review and are easily available. They make excellent supplements to the printed text for both teacher and student. They are highly recommended.

The Choice of Technology Makes a Difference

Each input device has its own strengths and weaknesses, just as each application has its own unique demands. With the wide range of input devices available, one of the problems that confronts the designer is to obtain a match between application and input technology. Part of the problem has to do with recognizing the relevant dimensions along which the application's demands should be characterized. The other is knowing how each technology being considered performs along those dimensions. These are topics addressed below and in the first reading by Buxton (1986).

One way to try and understand these issues is to experiment with a diverse set of representative tasks. Each task has its own idiosyncratic demands. Having enumerated such a set, we can determine which properties are relevant to a particular application, and then test the effectiveness of various technologies in their performance. This allows us to get a rapid match of technology to the application. Furthermore, the set of representative tasks provides a reminder of what dimensions should be considered in the selection process.

Below, we describe a basic set of such generic tasks. These make an excellent implementation project for the student, and provide a good testbed for developing a strong understanding of the characteristics of several input devices.

Like any other list, this one is not complete. It was chosen to reflect the types of 2D tasks typically found in Graphical User Interfaces, such as those illustrated in the excellent collection of widgets compiled in the video by Myers (1990). (This video is highly recommended to anyone studying interaction.) Text entry and 3D input are not emphasized. We leave it as an exercise to the reader to expand on the list.

Pursuit Tracking:

In this test, a moving target (a fly) moves over the screen under computer control. The operator uses the control device to track the fly's motion. Feedback about the operator's performance is given by a tracking symbol in the form of a fly swatter. The idea is to see how many times the fly can be killed by positioning the swatter over the fly and pushing a button device. Click here for Picture

Figure 1: Pursuit Tracking

A moving target (a fly) moves over the screen. The tracking symbol is a fly swatter. When the swatter is beside the fly, the user pushes a button device to swat the fly.

The main statistic in this test is how many times the fly can be swatted in a given time interval. There are a number of parameters that you want to have variable so that you can develop an understanding of their influence on the task performance. One of these is the speed that the target moves. Another is the control:display (C:D) ratio. The C:D ratio is the ratio between the distance the controller must be moved in order to cause the tracker to move a given distance on the display. For example, if the C:D ratio was 2:1, two centimeters of motion by the controller would result in only one centimeter of motion by the tracker.

A high C:D ratio is useful for fine cursor positioning, and for users who do not have well developed motor coordination (such as the aged or physically disabled). A low C:D ratio permits one to traverse the display with relatively little motion of the controller. Notice that with devices that use the same surface for both control and display, such as touch-screens and light-pens, the C:D ratio is almost always confined to 1:1. On the one hand, this gives this class of device a directness not shared by most other technologies. On the other, it can severely restrict their usefulness in a number of applications.

With the appropriate technology, the C:D ratio may be changed. Sometimes this is can be done by the user. In other cases, it may be changed automatically, depending on the task. The C:D ratio need not be linear. On the Macintosh computer, for example, the C:D ratio varies, based on the speed that the controller (the mouse) is moved. In effect, the C:D ratio is governed by an "automatic transmission."

One other parameter to consider in this test is the button push that is required to swat the fly. For example, can this be activated with the same hand that is providing the spatial control? This may be difficult if a touch-tablet or track-ball is used. If the same limb providing the spatial control cannot be used for the button event, is another limb free in the application being tested for?

Target Acquisition:

In this task, the user must select each of a number of squares displayed on the screen, as illustrated in Fig. 2. A square is selected by positioning the tracking symbol over it and signaling the system with a button event. Squares should be selected largest to smallest, left-to-right, top-to-bottom.

The main statistic to consider is how long it takes to select the full set of targets. Along with this, examine how target size affects the speed of target acquisition. The smaller the target the longer it will take to acquire, due to the fine motor control required. This is generalized as Fitts' Law. Click here for Picture

Figure 2: Target Acquisition

Select each square in turn, left-to-right, top-to-bottom. Notice how selection time is related to target size.

Essentially, Fitts's Law states that the movement time, MT, to acquire a target with a continuous linear controller is a function of the distance over the target size. Stated more formally, the time MT to move the hand to a target of width W, which lies at amplitude A away is:

MT = a + b l o g2 (A/W+1)

where a is a constant, and (according to Card, Moran & Newell, 1983, p. 241) b = 100[70~120] msec/bit.

The most current and thorough discussion of Fitts' Law can be found in MacKenzie (1992). The reading by MacKenzie (1992), summarizes many of the most important points for the practitioner, as well as presents some examples of how this law can be applied by the practitioner.

As in the pursuit-tracking task, issues such as C:D ratio and what button device is used will affect performance. There are a number of variations on this basic task. Each brings to light important aspects of the input technology:

* Homing Time: If an application involves frequently moving between a text entry device (usually a QWERTY keyboard) and a pointing device, it is important to know the effect of this movement on performance. This is the homing time discussed in the Keystroke Model (Card, Moran & Newell, 1980). To get a feeling for the effect of homing time, have the user push a key on the keyboard (the space-bar, for example), after each square is selected. Using the same tablet, for example, what is the difference in performing this task when using a puck versus a stylus?

* Number of Dimensions: Fitts' Law deals with movement in one dimension. There have been studies that have used it to make predictions and analyze performance in higher dimensional tasks. The two main questions that arise in this case have to do with (a) what is the effective target width in 2 or more dimensions? and (b) what is the effect of approach angle? The essence of the former question can be grasped by considering selecting a word, which is graphically short in the vertical dimension but wide horizontally. Should the width or height be used? The second question has to do with the question of whether we can move more effectively horizontally, than vertically, for example. These are points discussed in the reading by MacKenzie, and explored more fully in MacKenzie and Buxton (1992).

* Dragging and Rubber-Band Lines: In this variation, an object dragged (or a rubber-band line is stretched) from square to square. As shown by MacKenzie, Sellen, and Buxton (1991), dragging between targets is also a Fitts' Law task, and the act of maintaining the dragging state (holding down button on mouse, for example) can negatively affect the acquisition of the target. In particular, different devices are affected to different degrees by state maintenance. Hence, if dragging is an important part of the interaction style, devices should be evaluated in this mode as well.

* Left Hand vs Right Hand: It is very interesting to see how various devices perform in Fitts' Law tasks when performed using the left vs the right hand. This was investigated by Kabbash, MacKenzie and Buxton (1993). They found that the degradation in moving from the dominant to nondominant hand varied across devices. The lesson here is to be sure that devices are evaluated in the context in which they are to be used.

Free-Hand Inking:

Attempting to input a facsimile of your handwritten signature places yet another set of demands on the input technology. To get a feeling for the degree to which various devices lend themselves to this type of task, present the user with a screen ruled with lines of decreasing spacing. Have the user sign their name once in each space, as illustrated in Fig. 3. Use a simple subjective evaluation of the quality of their signature as a means of comparing devices. Click here for Picture

Figure 3: Free-Hand Inking

A device's ability to capture a good facsimile of your signature is a good measure of its effectiveness for inking and drawing tasks. Comparisons can be made for differing sizes to better understand the effect.

The attributes of this hand-writing task are relevant to several other common tasks, such as those seen in drawing programs, marking interfaces, and systems that utilize character recognition.

Tracing and Digitizing:

In many applications, such as cartography, CAD, and the graphic arts, it is often important to be able to trace material previously drawn on paper, or digitize points from a map. There is a wide variation in how well various devices can perform this type of task. Relative devices such as mice and trackballs are almost useless in this regard. Even with absolute devices like tablets, for example, there is a wide variation in their ability to perform this class of task. In particular, the demands on resolution and linearity vary greatly across applications. In cartography and engineering, for example, the accuracy of the digitization is often critical and far beyond what is required in digitizing a sketch.

Constrained Linear Motion:

In some applications it is important to be able to move the tracker rapidly along a straight-line path. One example is in using the scroll-bar mechanism of some systems. Another example might be where you want to use the motion of a mouse in one dimension, say Y, to control one parameter, without changing the value of another parameter being controlled by motion in the other dimension, X. Different devices vary in the ease with which this can be done. X/Y thumb-wheels, for example, would out-perform a mouse in the above task if the motion was along the primary axes. Click here for Picture

Figure 4: Constrained Linear Motion

How quickly can the ball be moved along the straight path without crossing the lines?

In the example task, illustrated in Fig. 4, the tracking symbol is a ball that is dragged along a linear path defined by two parallel lines. The object is to move along the path without crossing the parallel lines. How is speed affected by the input device used and the path width? Are similar results obtained if the path is vertical or diagonal?

Constrained Circular Motion:

A variation of the previous example is to see how well the user can specify a circular motion within a constrained region. This type of control is useful in manipulating 3D objects, for example, and is described in Evans, Tanner and Wein (1981). Click here for Picture

Figure 5: Constrained Circular Motion

How fast can the tracking symbol be moved around the circular shaded path without crossing the borders?

As in the previous example, different results will be obtained from different devices, and results will also vary according to C:D ratio and the width of the path.

3D Input

All of the tasks discussed in the previous section involved 2D control. As the power of graphics displays increases, 3D interfaces are becoming increasingly important to the user interface designer. What used to fall mainly within the domain of computer graphics specialists is now of concern to the designer of general application.

One consequence of the increasing prevalence of 3D applications is that designers need to consider what the 3D equivalents are of the representative 2D tasks discussed in the previous section. Another is that the designer must come to terms with what devices to use in supporting applications of higher dimensionality. At one end of the extreme, does one need special devices such as the instrumented gloves seen in Virtual Reality systems (Zimmerman, Lanier, Blanchard, Bryson & Harvill, 1987; video: Zacharey, 1987)? Or, can one support 3D interaction using the same, relatively non-intrusive devices found in conventional 2D interfaces?

There are a number of important studies that show that the latter case is often possible. One of the more compelling examples is given in Mackinlay, Card and Robertson (1990) and Mackinlay, Robertson and Card (1990). These papers describe a technique for navigating through a 3D system called the Information Visualizer. This is a system that uses novel graphical techniques for representing information. Among other things, different information is stored in different 3D "rooms," and this data is accesses by navigating from room-to-room using the techniques described. A video demonstration of this system is provided by Mackinlay, Card and Robertson (1990).

If one does want to control 3D worlds using 2D devices, perhaps the best paper to read is Chen, Mountford and Sellen (1988). This paper is important for two reasons. First, it introduces a novel new technique, the virtual 3D trackball[1]. It also represents an excellent comparative evaluation of a number of techniques, many of them deriving from an earlier paper by Evans, Tanner and Wein (1981). The techniques discussed in this latter paper are demonstrated in the video by Evans, Tanner and Wein (1981).

If one is going to use devices with 3 or more dimensions of freedom in interacting with 3D interfaces, there are a number of studies that can aid the designer. In this regard, the reader is referred to Ware and Jessome (1988), Ware and Osborne (1990), MacKenzie and Ware (1993) and Zhai, Buxton and Milligram (1994), for example. Each presents an experimental study evaluating different aspects of 3D interaction.

Finally, as an exercise, the reader is suggested to compare two different approaches to a 3D drawing package. Both are similar in that each uses constraints and "snapping" techniques. However, they differ greatly in how input is handled. The first is Gargoyle 3D, (Bier, 1990), shown in the video by Bier (1989). The second is 3-Draw, described in Sachs, Stoop and Roberts (1989), and demonstrated in the video by Sachs, Stoop and Roberts (1990).

Developing a Taxonomy of Input Devices

Introduction

The examples of the previous section highlight how different devices lend themselves to different tasks. In this section, we want to develop a categorization of input devices which is based on the properties that cause these differences. Our approach to this is through the tableau, shown in Fig. 6.

There is a hierarchy of criteria according to which devices are organized in this table. The tableau is limited, and only considers continuous, manually operated devices. Hence, the first two (implicit) organizational criteria are:

* continuous vs discrete?

* agent of control (hand, foot, voice, ...)?

The table is divided into a matrix whose primary partitioning into rows and columns delimit

* what is being sensed (position, motion or pressure), and

* the number of dimensions being sensed (1, 2 or 3)

These primary partitions are delimited by solid lines. Hence, for example, both the rotary and sliding potentiometer fall into the box associated with one-dimensional position-sensitive devices (top left-hand corner). These primary rows and columns are subdivided by dotted lines into secondary regions. These group:

* devices that are operated using similar motor skills (subcolumns)

* devices that are operated by touch vs those that require a mechanical intermediary between the hand and the sensing mechanism (sub-rows) Click here for Picture

Figure 6: Taxonomy of Input Devices.

Continuous manual input devices are categorized. The first order categorization is property sensed (rows) and number of dimensions (columns). Sub-rows distinguish between devices that have a mechanical intermediary (such as a stylus) between the hand and the sensing mechanism, and those which are touch sensitive. Sub-columns distinguish devices that use comparable motor control for their operation. (From Buxton, 1983.)

Grouping by motor action can be seen in examining the two-dimensional devices. Since they are in the same subcolumn, the tableau implies that tablets and mice utilize similar types of hand control and that this motor action is different from that used by joysticks and trackballs, which appear in a different sub-column.

The use of sub-rows to differentiate between devices that are touch activated and those that are not can be seen in comparing the light-pen and the touch screen. While each utilize the same basic motor control, the light-pen requires the use of a stylus. Hence, the two appear in different sub-rows.

The table is useful for many purposes by virtue of the structure which it imposes on the domain of input devices. First, it helps in finding appropriate equivalencies. This is important in terms of dealing with some of the problems which arose in our discussion of device independence.

The table makes it easy to relate different devices in terms of metaphor. For example, a tablet is to a mouse what a joystick is to a trackball. Furthermore, if the taxonomy defined by the tableau can suggest new transducers in a manner analogous to the periodic table of Mendeleev predicting new elements, then we can have more confidence in its underlying premises. We make this claim and cite the "torque sensing" one-dimensional pressure-sensitive transducer as an example. To our knowledge, no such device exists commercially. Nevertheless it is a potentially useful device, an approximation of which has been demonstrated by Herot and Weinzaphel (1978).

Generality and Extensibility

Choosing the input technologies to be used with a workstation generally involves making a trade-off between two conflicting demands. On the one hand, each task has specialized needs that can be best addressed by a specialized technology. On the other hand, each workstation is generally used for a multitude of tasks. Supplying the optimum device for each task is generally out of the question. A trade-off must be made.

Devices must be chosen to give the best coverage of the demands of the range of tasks. An important criterion in comparing devices, therefore, is how broad their coverage is in this regard. Stated differently, how many squares in Fig. 6 can a particular device be used to fill? Graphics tablets are important in this regard, for example, since they can emulate many of the other transducers. This is demonstrated in detail by Evans, Tanner and Wein (1981). The tablet is what could be called an "extensible" device. This property of extensibility is, in our opinion, an important but seldom considered criterion to be considered in device selection.

Relative vs Absolute Controllers

One of the most important characteristics of input devices is whether they sense absolute or relative values. This has a very strong effect on the nature of the dialogues that the system can support with any degree of fluency. As we have seen, a mouse cannot be used to digitize map coordinates, or trace a drawing because it does not sense absolute position. Another example taken from process control is discussed in the reading by Buxton (1986). This is the case of what is known as the nulling problem which is introduced when absolute transducers are used in designs where one controller is used for different tasks at different times.

What Our Taxonomy Doesn't Show

Perhaps the main weakness of the taxonomy presented above is the fact that it only considers the continuous aspect of devices. As the sample tasks discussed earlier in this chapter pointed out, other factors, such as the integration of button devices with continuous controllers has a strong impact on a device's performance. This is clear, for example, in the case of trying to "pick up" and drag an object with a mouse (where the button is integrated) compared to performing the same transaction with a trackball (where it is difficult to hold down the button (which is not integrated) with the same hand that is controlling the dragging motion.

An approach to capturing this aspect of devices is found in Buxton (1990a). Here, a 3-state model is developed. This model can be used to characterize both input devices and tasks. By providing a common vocabulary to describe both, a means of arriving at an appropriate match between the two is provided.

The reader is also referred to Card, Mackinlay, and Robertson (1990) and Card, Mackinlay, and Robertson (1991) which present a taxonomy of input devices which extends upon the model developed above. Finally, Lipscomb and Pique (1993) present a brief but useful and insightful means of categorizing input devices that compliments the work discussed above.

Chunking and Phrasing

A considerable part of what remains in this chapter has to do with alternative ways of articulating commands when interacting to the computer. The reading by Buxton (1986b) is intended to lay a theoretical foundation for doing so. The main thesis of this paper is that human-machine dialogues can benefit by appropriate phrasing, much in the same way as written and spoken language, and music.

Phrasing not only groups together things that are associated in meaning or purpose, it makes clear points of closure, that is, points at which one can be interrupted, or take a break. The structure that emerges from appropriate phrasing can accelerate the process whereby novice computer users "chunk" together concepts, thereby building cognitive skill.

The reading discusses the nature of skill acquisition and the use of phrasing in its acquisition. In so doing, it lays the foundation for how some of the literature on cognitive modeling can be extended to apply to the pragmatic and device levels of the interface. Finally, it prepares the reader for the sections which follow - those that deal with marking, gesture and two-handed input.

Marking

There is increasing interest in a style of interaction which has been called, for example, "paper-like", "pencentric", "pen-based," "character recognition," or "gesture driven." Many are not like paper, and many do not use a pen. What all have in common is that the user's input is in the form of a stream of x,y coordinates that could be called digital ink. Hence, we will refer to these as marking interfaces, applications and/or systems.

One of the main claims made for this class of system is that they are more natural and easy to use. In many ways, we agree. Our discussion on chunking and phrasing emphasizes the potential to be gained. However, it is important to realize that knowing how to use a pen is no more a ticket to mastering marking systems than knowing how to type is a ticket to mastering UNIX or MS-DOS. Most marking systems are extremely difficult to learn, are full of modes and are very prone to error. This is true even of the best of the currently available systems, such as that described by Carr (1991) and Carr and Shafer (1991). This is all to say that careful design is just as important here as in any other style of interface.

Who Does the Recognition?

Recognition lies at the heart of such systems, be it recognition of block characters, cursive script, proof-reader's symbols, or other annotations. Computer recognition is far from perfect, even with training and care taken by the user. There is little likelihood of this changing in the near future. In many ways, computer recognition is a "black hole" that has diverted energy and attention away from many other aspects of marking-based systems - aspects that in the long run may be far more important than recognition.

Why do we say this? The root lies in the observation that one seldom hears anyone complain, "I wish this paper could understand what I've written on it." The point is, there are many benefits of electronic documents that do not require the computer to recognize their contents. One of the main benefits of marking-based systems is in computer mediated human-human interaction, not just human-computer interaction, which as been getting most of the attention.

Hence, it is valuable to pay more attention to applications where recognition of the marks is done by a human rather than by a computer. As with notebook applications, this may be where the "recognizer" is also the author, or annotation systems, where the marked up document is read by someone else. The markings might be recognized as they are written, as in a the case of an electronic whiteboard (Elrod, et al., 1992), or at a later time, as in fax or email type applications.

The Freestyle system discussed in Case C is a good example of how computer recognition can be side-stepped. Here, one integrates email and voice mail with markings to provide a creative system for annotating electronic documents.

We are not trying to say that recognition can not augment the benefits of such applications. Both Hardock, Kurtenbach and Buxton (1993) and the reading by Goldberg and Goodisman (1991) show how Freestyle-like systems can be augmented with recognition. But we want to emphasize that the application is not dependent on it.

What is Recognized?

If the computer does do mark recognition, most of the time this means character or script recognition. This is obviously an important topic and deserves serious attention. Good discussions of it can be found in Pittman (1991), Plamondon, Suen and Simner (1989) and Suen (1990). The reading by Goldberg and Goodisman (1991) gives a brief introduction to character recognition, and a detailed analysis of how to handle recognition errors.

While recognizing characters and script can be useful, it typically is expensive both in terms of computational effort and the investment in training the system that the user typically has to make. We want to emphasize that powerful applications can result when the recognition focuses on higher level marks such as proof-reader type annotations. This is illustrated in the Tivoli electronic whiteboard application presented in the reading by Pedersen, McCall, Moran and Halasz (1993). Such marks are frequently easier to recognize and can be user independent.

User-definable graphical marks have also been demonstrated by some researchers. See, for example, Rubine (1991), and the paper and video by Wolf, Rhyne and Ellozy (1989).

Self Revelation and Marking Menus

It is interesting to recognize that a paper-like interface has the potential to be almost indistinguishable from UNIX or MS-DOS. Just consider the similarity between a blank piece of paper and a screen which is blank other than for a "%" or "A>" prompt.

Without some help, the user has no means of knowing what state the system is in, or what options are available. Hence, marking-based systems assume a form-filling style, or carry-over a stylistic similarity to classical GUIs. Often, such interface styles are not appropriate. Consider, for example, applications that must run on very small displays, such as on pocket organizers. Here, the screen real estate is simply not available.

Recently, another design option has been developed, that of Marking Menus (Kurtenbach and Buxton, 1991a; Kurtenbach and Buxton, 1991b; Kurtenbach and Buxton, 1993; Kurtenbach and Buxton, 1994).

Marking menus are an extension of "pie menus" (Callahan, Hopkins, Weiser and Shneiderman, 1988). The novice user presses down on a stylus and waits for a short interval of time (approximately 1/2 second). A pie menu of the available commands then appears directly under the cursor (Figure 7a). The user may then select a command from the pie menu by keeping the stylus tip depressed and making a stroke through the desired sector or slice of the pie. The slice is highlighted and the selection is confirmed when the pressure on the stylus is released. Click here for Picture

Figure 7. The transition from novice to expert reflected in two different ways of invoking commands. (From Kurtenbach & Buxton, 1991b).

The other option is to "mark ahead" by simply making the mark without waiting for the pie menu to pop up (Figure 7b). The first important point to note is that the physical movement involved in selecting a command is identical to the physical movement required to make the mark corresponding to that command. For example, a command that requires an up and to the right movement for selection from the pie menu, requires an up and to the right marking in order to invoke that command. Note the concept is similar to that of accelerator keys in many of today's applications. A user is reminded of the keystrokes associated with particular menu items every time a menu is displayed since the name of the corresponding keys may appear next to the commands. The difference is that marking menus, the user is not only reminded, but actually rehearses the physical movement involved in making the mark every time a selection from the menu is made. We believe that this further enhances the association between mark and command.

The second point to note is that supporting markings with pie menus in this way helps users make a smooth transition from novice to expert. Novices in effect perform menu selection. Users almost always wait for the pop-up menu and then select the desired sector when they first encounter a new menu. However, waiting for the menu takes time, and thus as users begin to memorize the layout (as they become expert), they begin to mark ahead to invoke the commands instead. From time to time, we have also observed an intermediate stage where users may begin to make a mark, and then wait for the menu to pop-up in order to verify their choice of action.

Why not just use pie menus rather than marks? Marks are faster after one has memorized the layout of the menu. Even if a user did not have to pause to signal for the menu to be popped up, one would still have to wait for the menu to be displayed before making a selection. In many systems, displaying the menu can be annoyingly slow and visually disturbing. There is anecdotal evidence that expert users avoid these problems by "mousing ahead" in pie menu systems (Hopkins, 1991).

One disadvantage of this technique is that "reselection" is not possible using marks. What is meant by reselection is the ability to change the item being selected before actually executing an item. For example, one can pop-up the menu and highlight a series of items before releasing the mouse button while on the desired item. Typically this behavior is exhibited by novices who are unfamiliar with the menu items. However, once familiar with a menu, users rarely use reselection. Hence an expert using marks has little need for reselection.

Working Within the Idiom

In music, composers choose the instruments that best match their musical ideas, and then write in the appropriate idiom. Some of the problems with current marking interfaces are due to designers not adequately contemplating what the idiom of such systems is. Let us examine a few examples.

Let us assume that character and cursive script recognition as implemented were 100% accurate. Would the systems suddenly be successful? The answer may well be no. Consider that even cursive writing is much slower than three-fingered typing. Why then write? There are certainly a number of reasons, such as the ability to do free-form layout, include sketches, and to incorporate graphical aspects of text layout. For example, it is far faster to write the following than to type it on a keyboard:

2[2] != 22

But virtually none of the properties of these benefits are captured by current recognizers, which force the user to write rectilinear text, more often than not in little bounding boxes.

If one merely needs to enter linear text and a keyboard is not available, it is important to recognize that it is faster to enter text with a stylus on a graphical keyboard than it is to print or towrite cursively. More to the point, the text thus entered will be more accurate and legible than that which is written. (No more confusion between the letter O and the digit 0.)

A graphical keyboard may not be appropriate or desirable. On a palm-top computer, for example, it one would not fit unless the "keys" were inordinately small. If one is to enter text using marks, yet is concerned with speed, then one alternative is to use shorthand. While recognizing traditional shorthand has been investigated (Leedham, Downton, Brooks & Newell, 1984), it is technically problematic. Furthermore, there are relatively few people who have the skill, and the skill is not easy to acquire. However, in some applications, the designer can invent an effective shorthand notation that does not have these disadvantages. The music shorthand illustrated in the Chunking and Phrasing reading by Buxton is one such example. A more generally applicable example is the Unistroke shorthand developed by Goldberg and Richardson (1993), and demonstrated in the video by the same authors. Click here for Picture

Figure 8: The Unistrokes Alphabet

The Unistrokes alphabet is shown in Figure 8. In keeping with the name, each character is represented by a single stroke mark. While the alphabet must be learned, this only takes about an hour or so. This process is aided by the use of Mnemonics. These are shown in Figure 9. Click here for Picture

Figure 9: Unistroke Mnemonics

Unistrokes are interesting for at least two reasons beyond the potential for faster input. First, since each character is fully specified by a single stroke, there is no segmentation problem. That is, the system need not invest any energy trying to figure out which stroke belongs to which character. Hence, there is no need to print characters within the confines of bounding boxes. Characters can even be recognized when the Unistrokes are written one on top of the other. An example where this would be useful would be entering data into a wrist-watch computer by writing Unistrokes on the touch-sensitive watch face. This would be far superior to the micro keyboards that are currently seen on some wrist-watches.

Second, because of the lack of segmentation problems, one need not look at the "page", or display, when writing Unistrokes. Consequently, an effective means of "heads up" writing is provided. Unlike paper and pen technologies, one can visually attend to the whiteboard in a lecture, or a document which one is transcribing, for example, and still take notes that are legible. Here, one has one of the key benefits of touch-typing in a marking-based interface.

Another aspect of the idiom is that, unlike direct manipulation GUIs, marking based systems can leave an explicit audit trail of what the user's actions. This is seen in Hardock, Kurtenbach and Buxton (1993), for example. In their MATE system, users mark annotations on electronic documents and then return them to the author. The author sees two views of the document: one with the annotated original text, the other with the modified, or edited text. This is illustrated in Figure 8.

Unlike Freestyle, the annotations can be invoked by the author to effect the indicated change. This is done by a point/select operation on the annotation. When invoked, the annotation remains visible, but "grayed out" so as to be distinguished from annotations that have not yet been acted upon. These grayed out marks represent the audit trail of actions taken. The benefit comes when one wants to undo an action. This is effected simply be reselecting a grayed out mark. How this differs from conventional direct manipulation systems is that it doesn't matter when the undone command was originally invoked. Since all past operations are visible (as grayed out marks), one has arbitrary undo capability without concern with the order in which things were done. This is a significant difference from conventional GUIs.

Finally, one of the most important aspects of the marking idiom is the figure/ground relationship that exists between computer printed vs. human printed text. From even twenty paces, it is absolutely clear which marks were made from the computer and which by hand. It is precisely this property that makes marking so powerful in annotation-type applications.

Click here for Picture

Figure 8: MATE in "Incorporation Mode".

In incorporation mode, a user can view the annotated document, and select which annotations to incorporate. Annotations t(seen in the left window) that have been "executed" appear as thin lines (e.g.., "ed"). Annotations that have not been executed appear as thick lines. Annotations are colour coded according to who made them. Annotations that represent commands can be executed by selecting them with the stylus. Annotations that have been executed can be "undone" by selecting them. The current state of the document appears in the right hand window. The user can navigate (scroll) independently or synchronously in each window. (From Hardock, Kurtenbach and Buxton, 1993).

Situated Design

Again, let us assume that character and cursive script recognition were perfect. Once more, ask yourself, would that solve the problems of marking systems? The question is obviously rhetorical. No matter how good the recognition, it is still generally faster to find an address in your address book, as opposed to your "Personal Digital Assistant" or PDA. The same is true for checking appointments in a calendar, looking up a word in the dictionary, or starting to take notes in a meeting.

When we specify systems, performance of the display, bus and disk drive are rigorously defined. What is peculiar, is that the same is not true for accessing applications. As an exercise, take any of the transactions described above (and others for which PDAs are used for) and time them with a stop-watch. The times obtained from the status quo paper and pen condition should set the minimum specification for the electronic version, or there had better be a very good reason why not. The point that is being missed by too many designers is that it is not enough to have the "best" calendar program on the market. It must also be there when I need it and where I need it. This is a requirement that is seldom met.

Design must start to pay more attention to the context in which tools are to be used, and the constraints that are imposed by those contexts. As our stop-watch example shows, these constraints can frequently be easily determined. It is time that these considerations become the norm rather than the exception.

Summary

In the right context, marking based interfaces have great potential. For this potential to be realized, we have to move away from the preoccupation with character and script recognition. While recognition is of real value, not enough attention is being paid to other aspects of this class of system.

The weaknesses of current systems is largely due to a lack of user-centred design. We have already pointed out examples where simple contextual considerations can help define performance specifications. Likewise, user testing - as discussed throughout this volume - is an important but too rare aspect of design. A notable exception is the work by Wolf and her colleagues (Wolf 1986, 1988, 1992; Wolf & Morrel-Samuels, 1987).

Perhaps the best way to keep on top of recent developments in mark-based computing is to subscribe to the bi-monthly newsletter Pen-Based Computing[2]. This is an inexpensive newsletter that contains information about business, technological and user interfaces issues. It is highly recommended.

Gestures

Many people use the term "gesture" to refer to the marking interfaces described in the previous section. While every mark requires a gesture in order to be articulated, it is worthwhile to recognize that it is the resulting mark and not the gesture, that is used as input to the system.

There is a distinct class of system in which it is truly the gesture itself which is recognized. Typically, such systems leave no marks, and produce more dimensions of input than the x,y point stream of marking input.

One of the most common ways to capture manual gesture is by instrumenting the hand. The main technique used for this is a special glove which is equipped with a number of sensors which provide the system with information about hand position, orientation, and flex of the fingers. Such a device, the Dataglove, is described in Zimmerman, Lanier, Blanchard, Bryson and Harvill (1987), and illustrated in the video by Zacharey, G. (1987). Such gloves are used mainly in virtual reality type applications. However, readers are directed to Fels and Hinton (1990), for a novel alternative. They describe a prototype system, Glovetalk, which recognizes manual sign-language as input, and produces continuous speech as output.

One of the pioneers in gesture-based input is Myron Krueger (1991). What is novel about his work is that it does not require any intrusive technology such as gloves to be worn. The input is acquired by the system via a video camera. By coupling the video signal with real-time image processing, the computer is able to "see" and recognize the manual gestures. This system is demonstrated in two videos: Krueger (1985 & 1988). They are highly recommended.

Perhaps use of gesture is most powerful when combined with other media, especially voice. This is a topic that we explore in the chapter which follows. In the meantime, the reader is referred to two videos for examples of such usage, Schmandt, et al. (1987) and Thorisson, Koons and Bolt (1992).

CSCW

Collaborative work is an area which puts special demands on input. At same place meetings, a means is required for rapid and fluid interaction. Such interaction may well be on a large electronic "whiteboard," such as the Liveboard described by Elrod, et al. (1992).

In synchronous meetings where people are at different sites, however, a whole new class of problem arises: that of remote awareness. Consider the Tivoli system discussed in the reading by Pedersen, McCall, Moran and Halasz (1993). If the people at the remote site want to point or indicate something, their only means is via the remote cursor, or telepointer. What this effectively does is reduce their gestural vocabulary to that of a fruit fly. Clearly, the effectiveness of Tivoli is different for same site vs remote meetings.

What is missing in the above example is some superimposition of the hands' gestures over the work surface. This is a problem that has been addressed in two very innovative papers by Tang and Minneman (1991a, 1991b). What they do is capture the image of the hands over the work surface and transmit it to the remote site, where it appears as a shadow. Hence, a far richer form of gestural interaction is possible. What remains is to effectively combine this approach with that of Krueger. In the meantime, however, it is worth noting that capturing and transmitting such "shadows" is valuable even if there is no recognition.

2 Handed Input

A good exercise for the student of manual input is to make a survey of how people use their hands in performing tasks in the everyday world. Consider, for example, painting, threading a needle, taking notes, opening a book and driving a car. Each is an example of people's ability to coordinate the action of their two hands. And each contrasts with how the two hands are used in interacting with computers.

Both hands are certainly used in typing, but this is a single task made up of discrete events. We also see two-handed usage in activating function and/or accelerator keys with one hand, while pointing with a mouse, for example, with the other. In this case, the button pushes are discrete and the pointing continuous. The class of interaction that we rarely see in interacting with computers, and illustrated by the real-world examples given above, has both hands performing continuous tasks in concert. This we call asymmetrical bimanual action. Our thesis in this section is that the quality of Direct Manipulation GUIs can be greatly improved through the incorporation of this class of input.

If we first want to learn more about human performance in this class of task, perhaps the best single source Guiard (1987). Guiard claims that bimanual asymmetric actions in the everyday world can be characterized by three properties:

* The non-dominant hand determines the frame of action of the dominant hand. Two examples are holding a nail that is to be hammered, and holding a needle which is to be threaded.

* The sequence of action is non-dominant hand, then dominant. This is seen in the nail and the needle examples, as well as that of a painter moving his palette to a convenient location, then dipping his brush into the desired paint pot.

* The action of the non-dominant is course relative to the fine action of the dominant hand. This is seen in the example of the painter's palette. The positioning of the palette is not as demanding as the accuracy required in dipping the brush into the appropriate paint pot.

If we are to design our user interfaces such as to exploit existing everyday skills, then these three characteristics should lay the basis for how the two hands are used. An example of how this can be done is given in Buxton (1990b). This paper works through a MacPaint example, where one is "painting" and wants to continue to paint on a part of the "page" which is not visible in the window. It contrasts the complexity of scrolling the page and resuming painting using two techniques: the standard "hand" dragging tool, versus using a trackball in the non-dominant hand. The former takes eight steps, the latter, one. The former interrupts the flow of action (painting). In the latter, as in the physical world, one does not have to lay the "paint brush" down in order to reposition the paper.

The ideas underlying the previous example were tested formally in an earlier experiment by Buxton and Myers(1986). This study showed that people spontaneously used two hands in performing compound tasks when there was a good match between task and action. This study puts to bed the often heard complaint about two-handed action, characterized by comments on the difficulty of "rubbing your stomach while tapping your head." Yes, there are cases where two-handed action is very difficult and requires a high degree of skill. But as the examples from Guiard show, there is a large repertoire of bimanual tasks in which users are already skilled. Buxton and Myers demonstrated that these skills could be applied in performing compound tasks, such as positioning and scaling, and navigation and selection.

More recently, Bier, Stone, Pier, Buxton and DeRose (1993) have developed a new 2-handed paradigm of interaction which is consistent with Guiard. The paradigm is called the See Through Interface, and it uses two new classes of widgets called tool glass and magic lenses. This can best be thought of as a 2 1/2 D interface. It functions on three planes:

* The Desktop: on which icons sit. This is consistent with conventional GUIs.

* The Cursor: that floats above the desktop and its icons. This is typically manipulated by the dominant hand, using a mouse, for example. This is consistent with conventional GUIs.

* The Magic Lens and Toolglass sheets: which lie between the cursor and the desktop. These sheets are much like the plastic protractors and rulers that one gets with drafting sets: you can see the tool and the markings on it, but also see what is on the "paper" below them. As typically are such rulers and protractors, the Magic Lens and Toolglass sheets are repositioned using the nondominant hand. This is done using a trackball or small touch tablet, for example.

The relationship among these three levels is illustrated in Figure 9. The example shows a Toolglass sheet with three click-through buttons on it. Each represents a different texture that might be assigned to the icon on the desktop. A particular texture is assigned by aligning the cursor and the desired Toolglass button over the icon (as represented by the vertical broken line in the figure) and clicking the mouse button.

Magic Lenses are also manipulated using the nondominant hand. They are visualization widgets that can be thought of as analogous to magnifying glasses with a diverse range of optical properties. For example, they enable one to highlight vertices, provide X-ray views of scenes, or filter out all objects except those of a particular colour.

We have space to provide only the briefest introduction to these new 2-handed widgets. Bier, Stone, Pier, Buxton and DeRose (1993), and Bier, Stone, Fishkin, Buxton and Baudel (1994) provide a much more expanded view of the overall design space. Kabbash, Buxton and Sellen (1994) presents an experiment that illustrates the usability and utility of Toolglass widgets. The interested reader is referred to these papers for more information

Click here for Picture

Figure 9: Click-Through Buttons:

Three click-through buttons are shown on a sheet of Toolglass. Each button represents a different texture with which icons can be coloured. The button with the desired texture is aligned with the cursor over the icon. Clicking "through" the button over the icon causes it to acquire that button's texture.

We believe that asymmetric bimanual input will be supported by future GUIs. It is perhaps the best way to improve both the "Directness" and "Manipulation" capabilities of so-called Direct Manipulation interfaces. However, two hands is not always better than one. There are no limits on bad design, and this includes 2-handed interfaces. Using two cursors, one for each hand, is something that should typically be avoided, as is shown by both Kabbash, Buxton and Sellen (1994) and Dillon, Edey and Tombaugh (1990). As with any interaction technique, achieving the potential benefits depends on appropriate human-centred design. Concerning the use of two hands, the Guiard paper is perhaps the best starting point for deriving some scientific basis for the constraints of such design.

Finally, there are people who do not have the use of one, much less two hands. Clearly, designs have to accommodate such special needs. Advocating 2-handed input for those able to exploit it does not imply ignoring those who cannot. This just adds one more thing that must be considered in the design.

Problems of Interfacing

Our current understanding is such that we are hard pressed to use the haptic channel to its full potential. We need more experience before this situation can be altered. However, obtaining this experience turns out to be rather difficult. If, for example, we want to gain some insights by comparing two devices, we will most likely find that they are incompatible physically, electronically, and/or logically. Hence, what should be a simple comparison turns into a logistical nightmare.

This is one area where things are starting to improve. This is largely due to the introduction of the Apple Desktop Bus (ADB). This is a standard bus for connecting input devices which was introduced by Apple (but available on some other computers, such as from Silicon Graphics). Because of its design, one can easily switch from one input device to another, thereby enabling one to study their respective properties.

The ease which with devices can be exchanged and interfaces should be one of the prime considerations when choosing a platform for studying HCI.

Transparent Access and the Physically Disabled

For most users, the problems of connecting different input devices to a system, as outlined in the previous section, are an annoyance. However, for users with physical disabilities, these problems can make the difference between their being able to use a computer or not. This, in turn, can have a major impact on their quality of life.

For most common input devices there exist special-purpose transducers that permit people with different physical disabilities to supply comparable signals. A mouse may be replaced by a tongue-activated joystick, or a button replaced by a blow-suck tube. It is reasonable to expect disabled persons to acquire such special-purpose devices. However, it is economically unreasonable and socially unacceptable to expect them to be dependent upon custom applications in order to interact with their systems.

What is required is transparent access to standard applications. That is, existing applications should be able to be used by simply plugging in the specialized replacement transducer. The difficulties in providing transparent access are exactly the same difficulties that we encountered in the preceding section where we wanted to replace one input device with another for comparative purposes. In recognizing that this is a problem "handicapping" all of us, perhaps the achievement of generalized transparent access will become a greater priority than it has up to now. It is a serious problem and needs to be addressed.

Device Independence and Virtual Devices

Recently there have been some efforts to overcome some of the problems that stand in the way of transparent access. In particular, the concept of device independent graphics has come into common practice.

Just as machine independent compilers facilitated porting code from one computer to another, device independent programming constructs have been developed for I/O. With input, the principle idea was to recognize that all devices more-or-less reduced to a small number of generic, or virtual devices. For example, an application can be written in a device-independent way such that it need not know if the source of text input is via a keyboard or a speech-recognition system. All the application need know is that text is being input. Similarly, the application need not require any information about what specific device is providing location information (in pointing, for example). All that it needs to know is what the current location is.

This idea of device independent has been discussed by Foley and Wallace (1974), Wallace (1976), Newman (1968), and Rosenthal, Michener, Pfaff, Kessener and Sabin (1982). It was refined and integrated into the standardized graphics systems ( GSPC, 1977; GSPC, 1979; ISO, 1983).

Within the GKS standard (ISO, 1983), the virtual devices are defined in terms of the values that they return to the application program. The virtual devices for input in GKS are:

* locator: a pair of real values giving the coordinate of a point in world coordinates.

* stroke: a sequence of x/y coordinates in world coordinates.

* valuator: a single number of type real.

* pick: the name of a segment.

* string: produces a string of characters.

* choice: returns a non-negative integer defining one of a set of alternatives.

For the designer of user interfaces, the main advantage of device independent graphics has been that one can experiment with different devices without normally having to modify the applications code. All that needs to be changed (from the software perspective) is the actual device driver. The application doesn't care what driver is being used for a particular device because the standard is defined in terms of the calling protocol and the number and type of parameters returned.

Device independent graphics has, therefore, had an important impact on our ability to rapidly prototype user interfaces. This is a subject discussed in more detail in Chapter 4, Development Tools, and is largely motivated by the iterative design methodologies discussed in Chapter 3, Design and Evaluation.

While device independence has been a real benefit, it has also lead to some problems. The reason is that some practitioners have confused technical interchangeability with functional interchangeability. Just because I can substitute a trackball for a mouse does not mean that the resulting user interface will still be satisfactory. As we have seen, devices have idiosyncratic properties that make them well suited for some tasks, and not for others. Further discussion of issues relating to device-independent graphics can be found in Baecker (1980).

Conclusions

In general, input has been neglected, especially in comparison to output. Consequently, it is an aspect of user interface design where there is a great deal of room for innovation.

One of the problems which has slowed down development in this area is the difficulty of integrating the computer, application and input devices. Such logistical problems are diminishing. Perhaps more significant in the long term, is that many people think about input at the device level, and as essentially a means of obtaining improved time-motion efficiency. That is, they see it as relating mainly to the motor/sensory system. As the reading on Chunking and Phrasing hopefully makes clear, this is one of the biggest mistakes that one can make. We believe strongly that effectively structuring the pragmatics of input can have a significant impact on the cognitive level of the interface. It is towards exploiting that potential that this chapter has been directed.

Readings

Buxton, W. (1986) There's More to Interaction than Meets the Eye: Some Issues in Manual Input. In Norman, D. A. and Draper, S. W. (Eds.), (1986), User Centered System Design: New Perspectives on Human-Computer Interaction, Hillsdale, N.J.: Lawrence Erlbaum Associates, 319 - 337.

MacKenzie, I.S. (1992). Movement time prediction in human-computer interfaces. Proceedings of Graphics Interface '92, 140-150.

Buxton, W. (1986b). Chunking and phrasing and the design of human-computer dialogues, in H.-J. Kugler (Ed.) Information Processing '86, Proceedings of the IFIP 10th World Computer Congress, Amsterdam: North Holland Publishers, 475-480.

Goldberg, D. & Goodisman, A. (1991). Stylus user interfaces for manipulating text. Proceedings of the Fourth ACM SIGGRAPH Symposium on User Interface Technology (UIST'91), 127 - 135.

Pedersen, E., McCall, K., Moran, T. & Halasz, F. (1993). Tivoli: an electronic whiteboards for informal workgroup meetings. Proceedings of Interchi '93, 391-398.

References/Bibliography

Apte, A., Vo, V. & Kimura, T. (1993). Recognizing Multistroke Geometric Shapes: An Experimental Evaluation. Proceedings of UIST'93, 121-128.

Baecker, R. (1980). Towards an Effective Characterization of Graphical Interaction, in R.A. Guedj, P. ten Hagen, F.R. Hopgood, H. Tucker and D.A. Duce (Eds.), Methodology of Interaction, Amsterdam: North Holland Publishing, 127 - 148.

Bier, E. (1990). Snap-dragging in three dimensions. Computer Graphics, 24(2), (Special issue on 1990 Symposium on Interactive 3D Graphics), 249-262.

Bier, E., Stone, M., Fishkin, K., Buxton, W., & Baudel, T. (1994). A taxonomy of see-through tools. To appear in Proceedings of CHI '94, Boston, April 24-28.

Bier, E. A., Stone, M., Pier, K., Buxton, W. & DeRose. T. (1993) Toolglass and magic lenses: the see-Through interface. Proceedings of SIGGRAPH '93, 73-80.

Bolt, R. A. (1984). The Human interface: Where People and Computers Meet. London: Lifetime Learning Publications.

Briggs, R., Dennis, A., Beck, B. & Nunamaker, J. (1993). Whither the pen-based interface. Journal of Management Information Systems 9(3), 71-90.

Brooks, F.P. Jr., Ouh-Young, M, Batter, J. & Kilpatrick, P.J. (1990). Project GROPE - haptic displays for scientific visualization, Computer Graphics 24(3), Proceedings of SIGGRAPH '90, 177-185.

Buxton, W. (1982). An Informal Study of Selection-Positioning Tasks. Proceedings of Graphics Interface '82, 323 - 328.

Buxton, W. (1983). Lexical and Pragmatic Considerations of Input Structures. Computer Graphics, 17(1), 31 - 37.

Buxton, W. (1986a) There's More to Interaction than Meets the Eye: Some Issues in Manual Input. In Norman, D. A. and Draper, S. W. (Eds.), User Centered System Design: New Perspectives on Human-Computer Interaction, Hillsdale, N.J.: Lawrence Erlbaum Associates, 319 - 337.

Buxton, W. (1986b). Chunking and phrasing and the design of human-computer dialogues, in H.-J. Kugler (Ed.) Information Processing '86, Proceedings of the IFIP 10th World Computer Congress, Amsterdam: North Holland Publishers, 475-480.

Buxton, W. (1990a). A three state model of graphical input. In D. Diaper et al. (Eds), Human-Computer Interaction - INTERACT '90, Elsevier Science Publishers B.V. (North-Holland), 449-456.

Buxton, W. (1990b). The natural language of interaction: A perspective on non-verbal dialogues. In Laurel, B. (Ed.). The Art of Human-Computer Interface Design, Reading, MA: Addison-Wesley. 405-416.

Buxton, W. Hill, R. & Rowley, P. (1985). Issues and Techniques in Touch-Sensitive Tablet Input, Computer Graphics, 19(3), 215 - 224.

Buxton, W. & Myers, B. (1986). A Study in Two-Handed Input. Proceedings of CHI '86, 321 - 326.

Buxton, W., Fiume, E., Hill, R., Lee, A. & Woo, C. (1983). Continuous Hand-Gesture Driven Input. Proceedings of Graphics Interface '83, 191-195.

Callahan, J., Hopkins, D., Weiser, M. & Shneiderman, B. (1988). An empirical comparison of pie vs. linear menus. Proceedings of CHI `88, 95-100.

Card, S., English & Burr. (1978), Evaluation of Mouse, RateControlled Isometric Joystick, Step Keys and Text Keys for Text Selection on a CRT, Ergonomics, 21(8), 601-613.

Card, S., Mackinlay, J. D. & Robertson, G. G. (1990). The design space of input devices. Proceedings of CHI '90, 117-124.

Card, S., Mackinlay, J. D. & Robertson, G. G. (1991). A Morphological analysis of the design space of input devices. ACM Transactions on Information Systems, 9(2), 99-122.

Card, S., Moran, T. & Newell, A. (1980). The Keystroke Level Model for User Performance Time with Interactive Systems, Communications of the ACM, 23(7), 396 - 410.

Card, S., Moran, T. & Newell, A. (1983). The Psychology of Human-Computer Interaction, Hillsdale, N.J.: Lawrence Erlbaum Associates.

Carr, R.M. (1991). The point of the pen. Byte, 16(2), 211-221.

Carr, R.M. & Shafer, D. (1991). The Power of PenPoint. Reading, MA.: Addisson-Wesley.

Chapanis, A. & Kinkade, R. (1972). Design of Controls, in H. Van Cott & R. Kinkade (Eds.), Human Engineering Guide to Equipment Design, Revised Edition, Washington: U.S. Govt. Printing Office, 345 - 379.

Chen, M., Mountford, J. & Sellen, A. (1988). A study in interactive 3-D rotation using 2-D control devices. Computer Graphics 22(4), 121-129.

Conrad, R. & Longman, D.J.A. (1965). Standard Typewriter Versus Chord Keyboard: An Experimental Comparison, Ergonomics, 8, 77-88.

Darragh, J., Witten, I. & James, M. (1990). The Reactive Keyboard: A Predictive Typing Aid. IEEE Computer, November, 23(11), 41-49.

Devoe, D.B. (1967). Alternatives to Handprinting in the Manual Entry of Data, IEEE Transactions on Human Factors in Electronics, 8(1), 21 - 32.

Dillon, R., Edey, J. & Tombaugh, J. (1990). Measuring the true cost of command selection: techniques and results. Proceedings of CHI '90, ACM Conference on Human Factors in Software, 19-25.

Earl, W.K. & Goff, J.D. (1965). Comparison of Two Data Entry Methods, Perceptual and Motor Skills, 20, 369 - 384.

Elrod, S., Bruce, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pedersen, E., Pier, K., Tang, J. & Welch, B. (1992). Liveboard: a large interactive display supporting group meetings, presentations and remote collaborations. Proceedings of CHI '92, 599-607.

Engelbart, D. English English, W. (1968). A Research Centre for Augmenting Human Intellect, Proceedings of the Fall Joint Computer Conference, 395 - 410.

English, W.K., Engelbart, D.C. & Berman, M.L. (1967). Display Selection Techniques for Text Manipulation, IEEE Transactions on Human-Factors in Electronics, 8(1)[[??]], 5 - 15.

Evans, K., Tanner, P. & Wein, M. (1981). Tablet-Based Valuators That Provide One, Two, or Three Degrees of Freedom. Computer Graphics, 15(3)[[??]], 91 - 97.

Fels, S. & Hinton, G. (1990). Building adaptive interfaces with neural networks: the glove-talk pilot study. In D. Diaper et al. (Eds), Human-Computer Interaction - INTERACT '90, Elsevier Science Publishers B.V. (North-Holland), 683-688.

Fitts, P. & Peterson, J. (1964). Information Capacity of Discrete Motor Responses. Journal of Experimental Psychology, 67, 103 - 112.

Foley, J.D. & Wallace, V.L. (1974). The Art of Graphic ManMachine Conversation, Proceedings of IEEE, 62(4), 462 - 470.

Foley, J.D., Wallace, V.L. & Chan, P. (1984). The Human Factors of Computer Graphics Interaction Techniques. IEEE Computer Graphics and Applications, 4(11), 13 - 48.

Foley, J. & van Dam, A. (1982). Fundamentals of Interactive Computer Graphics, Reading, MA: Addison-Wesley.

Foley, J., van Dam, A., Feiner, S. & Hughes, J. (1990). Computer Graphics Principles and Practice. Reading, MA: Addison-Wesley.

Francik, E. & Akagi, K. (1989). Designing a computer pencil and tablet for handwriting. Proceedings of the Human Factors Society 33rd Annual Meeting, 445-449.

Friedman, Z., Kirschenbaum, A. & Melnik, A. (1984). The Helpwrite Experiment: A Human Factors Application for the Disabled. Unpublished Manuscript, Technion - Israel Institute of Technology, Haifa 321000.

Gaver, W. (1991). Technology affordances, Proceedings of CHI '91, ACM Conference on Human Factors in Software, 79-84.

Gelderen, T. van, Jameson, A. & Duwaer, A. (1993). Text Correction in Pen-Based Computers: An Empirical Comparison of Methods. INTERCHI'93 Short Papers, 87-88.

Goldberg, D. & Goodisman, A. (1991). Stylus user interfaces for manipulating text. Proceedings of the Fourth ACM SIGGRAPH Symposium on User Interface Technology (UIST'91), 127 - 135.

Goldberg, D. & Richardson, C. (1993). Touch-typing with a stylus, Proceedings of InterCHI'93, 80-87.

Goodwin, N.C. (1975). Cursor Positioning on an Electronic Display Using Lightpen, Lightgun, or Keyboard for Three Basic Tasks, Human Factors, 17(3)[[??]], 289 - 295.

Gopher, D. & Koenig, W. (1983). Hands Coordination in Data Entry with a Two-Hand Chord Typewriter. Technical Report CPL 83-3, Cognitive Psychology Laboratory, Dept. of Psychology, University of Illinois, Champaign, ILL 61820.

Green, R. (1985). The Drawing Prism: A Versatile Graphic Input Device, Computer Graphics, 19(3), 103 - 110.

GSPC (1977). Status Report of the Graphics Standards Planning Committee, Computer Graphics, 11(3).

GSPC (1979). Status Report of the Graphics Standards Planning Committee, Computer Graphics, 13(3).

Guedj, R.A., ten Hagen, P., Hopgood, F.R., Tucker, H. and Duce, D.A. (Eds.), (1980), Methodology of Interaction, Amsterdam: North Holland Publishing.

Guiard, Y. (1987). Asymetric division of labor in human skilled bimanual action: the kinematic chain as model. Journal of Motor Behavior, 19(4), 486-517.

Haller, R., Mutschler, H. & Voss, M. (1984). Comparison of Input Devices for Correction of Typing Errors in Office Systems, Proceedings of Interact'84, Vol II, 218 - 223.

Hardock, G. (1991). Design issues for line-driven text editing / annotation systems. Proceedings of Graphics Interface '91, 77-84.

Hardock, G., Kurtenbach, G., and Buxton, W. (1993). A marking based interface for collaborative writing. Proccedings of UIST'93, 259-266.

Herot, C. & Weinzapfel, G. (1978). One-Point Touch Input of Vector Information from Computer Displays, Computer Graphics, 12(3),[[??]] 210-216.

Hopkins, D. (1991) The design and implementation of pie menus. Dr. Dobb's Journal, December, 1991, 16-26

ISO (1983). Information Processing-Graphical Kernel System (GKS) Functional Description, International Standards Organization, ISO/DP 7942.

Iwata, H. (1990). Artificial reality with force-feedback: development of desktop virtual space with compact master manipulator, Computer Graphics 24(3), Proceedings of SIGGRAPH '90, 165-170.

Jacob, R. (1991). The use of eye movements in human-computer interaction techniques: What you look at is whatyou get, ACM Transactions on Information Systems, 9(3), 152-169.

Johnstone, E. (1985). The Rolky: A Poly-Touch Controller for Electronic Music, in B. Truax (Ed.), Proceedings of the International Computer Music Conference, Vancouver, 291 - 295.

Kabbash, P., Buxton, W.& Sellen, A. (1994), Two-handed input in a compound task. To appear in Proceedings of CHI '94, Boston, April 24-28.

Kabbash, P., MacKenzie, I.S. & Buxton, W. (1993). Human performance using computer input devices in the preferre and non-preferred hands. Proceedings of InterCHI '93, 474-481.

Klemmer, E.T. (1971). Keyboard Entry, Applied Ergonomics, 2(1), 2 - 6.

Knowlton, K. (1975). Virtual Pushbuttons as a Means of PersonMachine Interaction. Proceedings of the IEEE Conference on Computer Graphics, Pattern Recognition, and Data Structures. 350 351.

Knowlton, K. (1977a). Computer Displays Optically Superimposed on Input Devices. The Bell System Technical Journal, 56(3), 367 - 383.

Knowlton, K. (1977b). Prototype for a Flexible Telephone Operator's Console Using Computer Graphics. 16 mm film, Bell Labs, Murray Hill, NJ.

Koons, D., Sparrell, C. & Thorosson, K. (1993). Integrating Simultaneous Input from Speech, Gaze and Hand Gestures. In M. Maybury (Ed.), Intelligent Multimedia Interfaces, Menlo Park, CA.: AAAI Press / MIT Press, 257-276.

Krueger, Myron, W. (1991). Artificial Reality II. Reading: Addison-Wesley.

Kroemer, K.H. (1972). Human Engineering the Keyboard, Human Factors, 14(1), 51 - 63.

Kuklinski (1985). A Case for Digitizer Tablets, Computer Graphics World, May 1985, 45 - 52.

Kurtenbach, G. & Buxton, W. (1991a). GEdit: a testbed for editing by contiguous gesture. SIGCHI Bulletin, 23(2), 22 - 26.

Kurtenbach, G. & Buxton, W. (1991b). Integrating mark-up and direct manipulation techniques. Proceedings of the Fourth ACM SIGGRAPH Symposium on User Interface Technology (UIST'91),137 - 144.

Kurtenbach, G. & Buxton, W. (1993). The limits of expert performance using hierarchic marking menus. Proceedings of InterCHI '93, 482-487.

Kurtenbach, G. & Buxton, W. (1994). User learning and performance with marking menus. To appear in Proceedings of CHI '94, Boston, April 24-28.

Lee, S.K., Buxton, W. & Smith, K.C. (1985). A Multi-Touch Three Dimensional Touch-Sensitive Tablet, Proceedings of CHI'85, 21 - 27.

Leedham, C., Downton, A., Brooks, C. & Newell, A. (1984), OnLine Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data Entry, Proceedings of Interact '84, Vol. 2, 86 - 91.

Levine, S.R. & Ehrlich, S.F. (1991). The Freestyle system: a design perspective. In A. Klinger (Ed.). Human-Machine Interactive Systems. New York: Plenum Press, 3-21.

Lipscomb, J. & Pique, M. (1993). Analog input device physical characteristics. SIGCHI Bulletin, 25(3), 40-45.

MacKenzie, I.S. (1992). Fitts' law as a research and design tool in human-computer interaction. Human Computer Interaction, 7(1), 91-139.

MacKenzie, I.S. (1992). Movement time prediction in human-computer interfaces. Proceedings of Graphics Interface '92, 140-150.

MacKenzie, I.S. & Buxton, W. (1992). Extending Fitts' law to two-dimensional tasks. Proceedings of CHI '92, 219-226.

MacKenzie, I.S., Sellen, A. & Buxton, W. (1991). A comparison of input devices in elemental pointing and dragging tasks. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 161-166.

MacKenzie, I.S. & Ware, C. (1993). Lag as a determinant of huan performance in interactive systems. Proceedings of InterCHI'93, 488-493.

Mackinlay, J. D., Card, S. & Robertson, G. G. (1990). Rapid controlled movement through a virtual 3D workspace. Computer Graphics 24(3), Proceedings of SIGGRAPH '90, 171-176.

Mackinlay, J. D., Robertson, G. G. & Card, S. (1990). Rapid controlled movement through virtual 3D workspaces. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 455-456.

Matias, Edgar, MacKenzie[, ]I. Scott & Buxton[, ] William (1993), Half-QWERTY: A one-handed keyboard facilitating skill transfer from QWERTY. Proceedings of InterCHI '93, 88-94.

McGrath, R. (1985), PC Focus: TurboPuck, a Precision Pointing Device, Computer Graphics World, August 1985, 45 - 48.

Minneman, S.L. & Bly, S.A. (1991). Managing à trois: a study of a multi-user drawing tool in distributed design work. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 217- 224.

Minsky, M. (1985). Manipulating Simulated Objects with RealWorld Gestures Using a Force and Position Sensitive Screen, Computer Graphics, 18(3), 195 - 203.

Montgomery, E. (1982). Bringing Manual Input into the 20th Century, IEEE Computer, 15(3), 11 - 18. See also follow-up letters in the May and, June & October 1982 issues.

Murakami, K. & Taguchi, H. (1991). Gesture recognition using recurrent neural networks. Proceedingsof CHI'91, 237-242.

Newman, W.M. (1968). A Graphical Technique for Numerical Input, Computing Journal, 11, 63 - 64.

Norman, D.A. & Fisher (1982). Why Alphabetic Keyboards are not Easy to Use: Keyboard Layout Doesn't Much Matter, Human Factors 24(5), 509 - 519.

Noyes, J. (1983). The QWERTY Keyboard: a Review, International Journal of Man-Machine Studies 18, 265 - 281.

Ohno, K, Fukaya, K. & Nievergelt, J. (1985). A Five-Key Mouse with Built-In Dialogue Control, SIGCHI Bulletin,17(1), 29 - 34.

Owen, Sid (1978). QWERTY is Obsolete. Interface Age, January 1978, 56 - 59.

Pedersen, E., McCall, K., Moran, T. & Halasz, F. (1993). Tivoli: an electronic whiteboards for informal workgroup meetings. Proceedings of Interchi '93, 391-398.

Pickering, J.A. (1986). Touch-sensitive screens: the technologies and their applications, International Journal of Man-Machine Studies, 25(3), 249-269.

Pittman, J.A. (1991). Recognizing handwritten text. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 271-275.

Plamondon, R., Suen, C.Y. & Simner, M.L. (Eds.)(1989). Computer recognition and human production of handwriting. Teaneck, N.J.: World Scientific Publishing Co. Inc.

Pulfer, K. (1971). Man-Machine Interaction in Creative Applications, International Journal of Man-Machine Studies, 3[[??]], 1-11.

Ressler, S. (1982). An Object Editor for a Real Time Animation Processor, Proceedings of Graphics Interface '82, Toronto, 221-223.

Richtie, G.J. & Turner, J.A. (1975). Input Devices for Interactive Graphics, International Journal of Man-Machine Studies, 7, 639 - 660.

Roberts, M. & Rahbari, H. (1986). A multi-purpose system for alpha-numeric input to computers via a reduced keyboard. International Journal of Man-Machine Studies, 24, 659-667.

Rochester, N., Bequaert, F. & Sharp, E. (1978). The Chord Keyboard, IEEE Computer, 11(12), 57 - 63.

Rosenthal, D.S., Michener, J.C., Pfaff, G., Kessener, R., & Sabin, M. (1982). Detailed Semantics of Graphical Input Devices, Computer Graphics, 16(3), 33-43.

Rubine, D. H. (1991). Specifying gestures by example. Computer Graphics 25(4), Proceedings of SIGGRAPH '91, 329 - 337.

Sachs, E., Stoop, D. & Roberts, A. (1989). 3-Draw: a three dimensional computer aided design tool, Proceedings of the 1989 IEEE International Conference on Systems, Man and Cybernetics, Cambridge, MA.,1194-1196.

Schmandt, C. (1983). Spatial Input/Display Correspondence in a Stereoscopic Computer Graphic Work Station, Computer Graphics 17(3), 253 - 261.

Seibel, R. (1962). A Feasibility Demonstration of the RapidType Data Entry Station. Research Report No. RC 845, Yorktown Heights, N.Y.: IBM Thomas J. Watson Research Center.

Seibel, R. (1972). Data Entry Devices and Procedures, in H. Van Cott & R. Kinkade (Eds.), Human Engineering Guide to Equipment Design, Revised Edition, Washington: U.S. Govt. Printing Office, 311 - 344.

Sellen, A., Kurtenbach, G. & Buxton, W. (1992). The prevention of mode errors through sensory feedback. Human Computer Interaction, 7(2), 141-164.

Sherr, S. (Ed.)(1988). Input Devices. Boston: Academic Press.

Shoemake, K. (1992). ARCBALL: a user interface for specifying three-dimensional orientation using a mouse. Proceedings of Graphics Interface '92, 151-156.

Smyth, M. & Wing, A. (Eds.)(1984). The Psychology of Human Movement, London: Academic Press.

Suen, C. (Ed.) (1990). Frontiers in handwriting recognition: Proceeding s of the International Workshop on Frontiers in Handwriting Recognition, Montreal: Centre for Pattern Recognition and Machine Intellignce, Concordia University, Montreal, Quebec, Canada H3G 1M8.

Tang, J.C. & Minneman, S.L. (1991a). VideoDraw: A video interface for collaborative drawing. ACM Transactions on Information Systems, 9(3), 170-184.

Tang, J.C. & Minneman, S.L. (1991b). Videowhiteboard: video shadows to support remote collaboration. Proceedings of CHI '91, ACM Conference on Human Factors in Software, 315-322.

Wallace, V.L. (1976). The Semantics of Graphical Input Devices, Proceedings of the Siggraph/Sigplan Symposium on Graphical Languages, 61 - 65.

Ward, J.R. & Philips, M.J. (1987). Digitizer technology: performance characteristics and the effect on the user interface. IEEE Computer Graphics and Applications, April Issue, 7(4), 31-44.

Ware, C. & Jessome, D. (1988). Using the bat: a six-dimensional mouse for object placement. IEEE Computer Graphics and Applications 8(6), 65-70.

Ware, C. & Osborne, S. (1990). Exploration and virtual camera control in virtual three dimensional environments. Computer Graphics 24(2), 175-183.

Welford, A. (1976). Skilled Performance: Perceptual and Motor Skills, Glenview, IL: Scott, Foresman & Co.

Wolf, C.G. (1986). Can People Use Gesture Commands? ACM SIGCHI Bulletin, 18(2), 73 - 74.

Wolf, C.G. (1988). A comparative study of gestural and keyboard interfacess. Proceedings of the 32nd Annual Meeting of the Human Factors Society, 273-277.

Wolf, C.G. (1992). A comparative study of gestural, keyboard and mouse interfaces. Behaviour & Information Technology, 11(1), 13-23.

Wolf, C.G. & Morrel-Samuels, P. (1987). The use of hand-drawn gestures for text-editing, International Journal of Man-Machine Studies, 27, 91 - 102.

Wolf, C., Rhyne, J. & Ellozy, H. (1989). The Paper-Like Interface. In G. Salvendy & M.J. Smith (Eds.) Designing and Using Human-Computer Interfaces and Knowledge-BAsed Systems, Amsterdam: Elsevier Science Publishers B.V., 494-501.

Wolf, C., Rhyne, J., Zorman, L. & Ossher, H. (1991). WE-MET (window environment-meeting enhancement tools), Proceedings of CHI '91, ACM Conference on Human Factors in Software, 441-442.

Zhai, S., Buxton, W. & Milgram, P. (in press). The "Silk Cursor": Investigating transparency for 3D target acquisition. To appear in Proceedings of CHI '94, Boston, April 24-28.

Zimmerman, T.G., Lanier, J., Blanchard, C., Bryson, S. & Harvill, Y. (1987). A Hand Gesture Interface Device, Proceedings of CHI+GI '87, 189 - 192.

Video Examples

Bier, E. (1989). Gargoyle 3D: Snap-Dragging in 3D. SIGGRAPH Video Review 47, New York: ACM.

Bier, E. & Pier, K. (1987). Snap-Dragging and the Gargoyle Illustration System. SIGGRAPH Video Review 33, New York: ACM.

Brooks, F. et al. (1981). The Grip-75 Man-Machine Interface. SIGGRAPH Video Review 4, New York: ACM.

Buxton, W. (1984). Selection-Positioning Task Study. SIGGRAPH Video Review 12, New York: ACM.

Buxton, W. & Baecker, R. (1987). Research Human Interaction. SIGGRAPH Video Review 26, New York: ACM.

Evans, K., Tanner, P. & Wein, M. (1981). Graphics Interaction at NRC. SIGGRAPH Video Review 4, New York: ACM.

Francik, E. (1989). Wang Freestyle. SIGGRAPH Video Review 45, New York: ACM.

Goldberg, D. & Richardson, C. (1993). Touch typing with a stylus. SIGGRAPH Video Review 88, New York: ACM.

Krueger, M. (1985). Videoplace Sampler, SIGGRAPH Video Review 20, New York: ACM.

Krueger, M. (1988). Videoplace '88, SIGGRAPH Video Review 40, New York: ACM.

Mackinlay, J. D., Card, S. & Robertson, G. G. (1990). Rapid controlled movement through a virtual 3D workspace. SIGGRAPH Video Review 65, New York: ACM.

Myers, B. (Ed.)(1990). All the Widgets. SIGGRAPH Video Review 57, New York: ACM.

Plaisant, C. & Shneiderman, B. (1990). Scheduling Home Control Devices. SIGGRAPH Video Review 63, New York: ACM.

Roberts, S. (1989). 16,000 Miles on a Bicycle. SIGGRAPH Video Review 47, New York: ACM.

Rutledge, J. & Selker, T. (1990). In-Keyboard Analog Pointing Stick: A Case for the Pointing Stick. SIGGRAPH Video Review 55, New York: ACM.

Sachs, E., Stoop, D. & Roberts, A. (1990). 3-Draw: A tool for the conceptual design of 3D shape, SIGGRAPH Video Review 55, New York: ACM.

Schmandt, C. et al. (1984). Put That There. SIGGRAPH Video Review 13, New York: ACM.

Shneiderman, B. (1991). Touch screens now offer compeling uses. IEEE Software, 8(2), 93-107.

Sutherland, I. (1984). Sketchpad, SIGGRAPH Video Review 13, New York: ACM.

Theil, D. (1990). The Cue Ball as art of a Gestural Interface. SIGGRAPH Video Review 63, New York: ACM.

Thorisson, K., Koons, D. & Bolt, R. (1992). Multi-Modal Natural Dialogue. SIGGRAPH Video Review 76, New York: ACM.

Ward, G. (1985). Software Controll at the Stroke of a Pen. SIGGRAPH Video Review 18, New York: ACM.

Wolf, C., Rhyne, J. & Ellozy, H. (1989). The Paper-Like Interface. SIGGRAPH Video Review 47, New York: ACM.

Zacharey, G. (1987). The Dataglove. SIGGRAPH Video Review 27, New York: ACM.


[1] See also Shoemake, 1992, for an implementation of a variation of this technique.

[2] Pen-Based Computing: the Journal of Stylus Systems, Stylus Publishing, P.O. Box 876, Sandpoint, Idaho 83864-0876. Tel: (208) 265-5286. Email: nickbarab@bix.com.