CHAPTER 2 Background

This chapter presents a description of the status quo in videoconferencing and media space conferencing in order to establish the AVSA concept. We then describe the general development and evaluation strategies used to develop the system.

2.1 The AVSA Concept

The idea for the AVSA was first introduced by Professor William Buxton. His experiences at Xerox-EuroPARC with the Integrated Interactive Intermedia Facility (IIIF)[Buxton et al 1990] and at the University of Toronto with the Ontario Telepresence Project (OTP)[Resnick 1992] uncovered a serious deficiency in the quality of interaction experienced by electronic visitors to a media space[Bly et al 1993]- namely the limited access and control of media spaces. The following sections describe the situation in detail.

2.1.1 Traditional Videoconferencing

In traditional videoconferencing a person at conference room A communicates to a person at conference room B in real-time through the public switched telephone network or, equivalently, PSTN (Figure 1). The connection between points is made just like a regular telephone call. A person at conference room A pre-arranges a meeting at a certain time with a person at conference room B and then dials B's number (or vice versa).

Figure 1: The traditional videoconferencing setup.

What makes this connection special is the media rich nature of the communication, i.e. high quality audio and video (A/V). Both parties are equipped with what will henceforth be referred to as a node. A node (Figure 2) consists of a microphone, speaker, camera and monitor.

Figure 2: This picture shows the camera, monitor, microphone, and speaker of a node.

The A/V obtained at each conference room must to be transmitted to the other in real-time. To accomplish the transfer of this high-bandwidth analog data, each conference site is also equipped with a coder/decoder (CODEC). The CODEC quickly converts the analog A/V data into digital data and transmits it to the other site through a high bandwidth digital PSTN subscriber line. This node/CODEC setup of the traditional videoconferencing room is often called an orphan CODEC.

2.1.2 Media Spaces

Functionally, a media space is an extension of the traditional videoconferencing system in a local setting. When a person at node A wants to connect to a person at node B, the person at node A schedules the meeting and then calls node B. The main difference lies in the architecture (Figure 3) and, consequently, the way the connections are actually made.

Figure 3: A media space allows videoconferencing through a GUI controlled LAN and local A/V network.

In traditional videoconferencing the A/V information is routed though the PSTN. In the media space environment A/V information is routed through a local A/V network called the hub. At the University of Toronto this hub is a hardware/software system called the IIIF.

In traditional videoconferencing the actual routing is also performed by the PSTN. The destination is specified by dialing the number of another CODEC. The PSTN then completes the connection between the two CODECs. Dialing a CODEC is the same as dialing a room because each CODEC is associated with a specific room. In the media space environment the routing is handled by a server connected to the local area network (LAN).

To complete a connection the following events must occur. First, the caller accesses a graphical user interface (GUI), called the telepresence (TP) application, on a computer associated with a node. This computer is connected to the LAN. Then, using the TP application, the caller selects a node (which is essentially a room) to connect to. The TP application sends this request to the server which in turn sends the appropriate commands to the IIIF specifying how the A/V signals should be routed. The IIIF then takes over and makes the appropriate A/V connections.

The first advantage of the media space is that it gives a person a choice as to where or from where to meet with any one of a number of people. A second advantage is that the nature of the local A/V network yields high-quality A/V connections that can be executed quickly when compared to traditional videoconferencing.

The disadvantage is that it only allows communication between people who are connected to the local A/V network. No provision is made to contact people outside the local system. To address this deficiency we add a CODEC to the media space and update the architecture (Figure 4). The procedure for making a connections within the media space remains the same. However, to make a connection to a traditional videoconferencing site two things must occur. First, the local node must connect to the local CODEC. Second, the local CODEC must connect to the remote CODEC. Both events are facilitated through the TP application available to the local node. The user uses the TP to tell the server, through the LAN, to make an A/V connection between the local node and the local CODEC. The user then uses an extension of the TP to tell the server to tell the local CODEC what number to dial. The local CODEC then dials the number of the remote CODEC. Communication can commence once someone at the remote CODEC answers the call.

Figure 4: Media space connecting to a traditional videoconference room.

Although long distance communication is enabled with this setup we still find the system deficient in that there is no provision for a member of one media space to communicate with a member of another media space. It is possible for a member of one media space to call the other media space CODEC to CODEC, but without the servers of the two media spaces to negotiate the connection of the nodes the call is useless.

In order to make a person to person connection through two media spaces the following three connections must be made. The local node must connect to the local CODEC. The local CODEC must call the remote CODEC. The remote CODEC must connect to the remote node. To address this problem IIIF-2-IIIF communication was developed (Figure 5). Signalling is handled by the local server. When the local node requests to connect to a remote node the local server negotiates the appropriate connections with the remote server over a wide area network (WAN) or internet.

Figure 5: Architecture for IIIF-2-IIIF communication.

Unfortunately, there are many more traditional conference rooms than media spaces. Therefore, it is essential that communication between the media space and the traditional videoconference room be provided. We already looked at making a connection from a media space to the traditional conference room. Now let us look at making a connection from the traditional conference room to a media space (Figure 6). In order to make a person-to-person connection from a traditional videoconference room, two intermediate connections must be made. First the locaexecutedl CODEC must call the remote CODEC. Second, the remote CODEC must connect to the remote node.

Figure 6: Traditional videoconference room connecting to a media space.

The first connection can easily be made. However, a traditional videoconferencing room has no control over the server of a remote media space. Therefore, the connection of the remote CODEC to a remote node cannot be made. It is precisely this issue that the AVSA is designed to address.

2.1.3 The Deficiencies

The two deficiencies that stem from the inability of a person at a traditional videoconferencing room to control a remote media space's server are the inability to:

make a connection to a person
control media space resources

The following sections discuss these deficiencies further.

Making a connection

To visit the media space without the AVSA, a visitor at an orphan CODEC must go through the following sequence of events:

pre-arrange a time with a member of the media space
call the media space's CODEC
wait to be received by a member of the media space (i.e. wait for the member to connect to the local CODEC through the TP application)

Imagine the analogous situation in telephony (Figure 7). Party A wishes to place a call to party B. A dials B's number. A is connected to a central switch and then must wait for B to dial A's number before the two individuals can talk. This situation is absurd and would be completely unacceptable. However, if it were not for automatic switching systems it would be norm. Unfortunately, it is the norm for media space communication from an orphan CODEC to a media space.

Figure 7: The analagous telephony situation without automatic switching.

The root cause of the problems is that the visitor has no control over the media space to which they are connected[Gujar et al 1995]. The following is a list of problems/consequences associated with this type of setup:

Most important meetings cannot be planned. They are often ones which involve important business or strategic decisions. By forcing a step which may involve time consuming phenomenons such as "phone-tag" or "e-mail-tag", the effectiveness of using the system is seriously hampered.
Once connected to the media space the visitor is stuck waiting to be received. The visitor has no way of knowing when or even if they will be received. It is possible that the person with whom the meeting was arranged was held up and will be late. It is even possible that the person forgot about the meeting!

Controlling media space resources

Lack of access to a media space is not the only problem associated with visiting a media space. One of the goals of media spaces is to allow people to attend/participate in meetings without being physically present, but at a level that would enable communication as though they were physically present. To achieve this it is important for us to ask the question "For effective communication, is it sufficient to provide an audio/video connection to a media space or should the visitor have some degree of control over the media space?" Research[Gaver 1992][Gaver et al 1995][Gujar et al 1995] and experience tell us that the latter is more sensible.

Imagine a person electronically joining a meeting that is in progress. However, the camera from which the visitor is supposed to be viewing the conference is blocked by a physical attendee. How can the visitor correct the situation? In the current setup the only way is to ask a physical attendee to switch the visitor to a different camera or to ask the physical attendee to move to a different seat within the room. Both solutions disturb the flow of the meeting and can be irritating to the physical attendees.

There are many other scenarios in which the visitor must ask the physically present person to alter the states of devices for them. Aside from disturbing meetings this process also detracts from the visitors experience of the visit. The visitor feels more like a passive observer (as though they were watching television) instead of an active participant. Thus, by giving control to the visitor their experience will be greatly enhanced.

The final deficiency with the current setup is that there is no way to explore the media space and obtain information without actually talking to a member of the media space. From the visitor's point of view it is important to be able to access information at any time (as with television technology) and, equally important, to be able to specify what information to access at any time . From the point of view of the members of the media space this issue is important in that one should be able to provide an information bank from which visitors can access information. Once again, we can provide visitors access and control to resources that will enable this type of information exchange.

2.2 Development Strategy

Having defined the problems, the next task is to decide how to develop the system we want in a timely manner, within our budget and without sacrificing usability of the eventual interface. We decided on a three stage iterative technology-driven[Danis et al 1995] process.

The first stage is to outline the basic design of the system. First we consider the different technologies, their limitations and their affordances. Then by acquiring input from various individuals familiar with media spaces and individuals not familiar with them we assess the needs of the system and use the technology considerations to produce the basic design of the system.

The next stage is to build a prototype of the system as quickly as possible. With this prototype we:

will be able to informally evaluate the system and, if all goes well, have proved the concept.
will be able to obtain user input and further information regarding the interface. This information will be incorporated into the next iteration to improve the system.

The final stage is to build a self-contained system based on the knowledge gained in the first two stages. This system will be transportable to other sites with similar media spaces for further research and/or evaluation.

2.3 Evaluation Strategy

Our goal is to create a usable system in a timely manner. Therefore, no formal evaluation of the AVSA will be done. All evaluation is done in an informal setting on two levels.

The first level of evaluation is on an ongoing basis. This means that as features are implemented they are evaluated. This level of evaluation is performed by the development team and members of the Input Research Group (IRG). The process is to observe or use the interface, looking at specific features, and suggesting ways to improve interaction.

The second level of evaluation will take part at the end of the second development stage. This level of evaluation is based on comments of people from within the DGP lab and people from outside the university. These people will have varying levels of knowledge of computer science and videoconferencing. The process will be to:

introduce people to the interface
explain the purpose and its abilities
ask the person to complete some tasks, like visit a room, play a particular demo, contact a media space member
observe how they perform the task
upon completion of the task note their comments on how to improve the interface.

Some people outside the university, who cannot come to the lab to try the system, will be given a written description of the interface and a video tape of a person using the interface and asked to evaluate the system. They detail their general impressions of the system and forward these points to us.

Contents

Anuj Gujar's Home Page