CHAPTER 5 Stage 3: Self-Contained System

The second stage in the AVSA development allowed us to produce a usable interface that was readily accepted by the various people who reviewed it. The goal of this stage was to take user input from the previous stage and integrate it into the system and more importantly wrap the AVSA up into a nice, stable and self-sufficient package.

The AVSA of stage two relied heavily on the state of the reactive room subsystem. If the reactive room malfunctioned the AVSA did not work. In addition, the communication between the PC/UNIX environments made the AVSA unstable in some cases. It placed heavy requirements on an environment not meant for such tasks. Finally, since the PC did not support multiprocessing it was not possible to set up a monitoring system (i.e. daemon) for the AVSA to accept requests from the media space. This chapter describes the third stage in the development of the AVSA- developing an integrated self-contained system on the UNIX platform.

5.1 Ingredients

The basic parts required for this stage are the same as those required in stage two and are described in Section 4.1. However since we wanted a more stable and self-sufficient system, the actual ingredients acquired would be different in that they will all operate in the UNIX environment.

5.2 Recipe

The recipe was slightly different from stage two's. Fortunately, even though the ingredients would be different we would still be able to port some components, like the translator, since they will remain constant. The recipe for this stage is as follows:

choose, gather, install the ingredients.
connect the new architecture to the media space through A/V links and Ethernet.
investigate and use established TCP/IP utilities to communicate control information via the communication protocol established in stage 2.
port the translator to the new system.
integrate the ASR.
port the interface, integrating suggestions from users during the previous stage.

5.3 Preparation

Once again we will use the recipe to build the new self-contained system. The following sections describe our efforts.

5.3.1 Ingredients

Since one of the aspects we wanted to improve in this stage was the communication, we decided that we should implement the AVSA in some multiprocessing environment. We also wanted to make it more compatible with the environment in which the media space was already constructed. This would help create a more stable communication system. Our choices were thus limited to the UNIX platform. Since we had convenient access to Silicon Graphics Incorporated (SGI) hardware and software we decided that development should be done in the SGI environment with the IRIX operating system. As a result, we acquired an Indy workstation on which the AVSA would reside.

We also needed video overlay technology for the Indy that would allow the AVSA to perform the video overlaying which we required (described in Section 3.2.1). For this we acquired an Indy Video card and its associated video library.

For the speech recognition we acquired an ASR developed by SGI. Indy's come with a standard CD quality sound processing card with an external input so we did not have to purchase any extra hardware or do any installations to process the audio.

Indy's also come with their own standard Ethernet card. Hence, no special hardware or software installation for network connectivity was required.

To build the interface we decided to use the tcl/tk development environment. We would use the standard C compiler to combine all of the different technologies. By using these standard software development environments we were assuring ourselves that if, in the future, the system was to be ported to another platform, the interface would not have to be redeveloped from scratch.

5.3.2 The development team

The development team was downsized at this point. We did not required as much input during the development into the interface. Instead we needed an efficient team that could develop the system quickly. The team consisted of:

William Buxton - his role stayed the same as that in the second stage.
Anuj Gujar - his role was to acquire all the ingredients and build the AVSA on the SGI platform
Akil Nasser - an undergraduate student with the computer science department. His role was to assist in porting the interface and in working with the new overlay technology.
David Audrain - An undergraduate student and exceptional C programmer from France. His role was to help develop a driver for the laserdisc player for integration into the video on demand subsystem.

5.3.3 Linking to the Media space

We used the same IIIF wires used in the first prototype to connect the Indy to the media space. We also used the same Ethernet drops. Since the Indy runs on the UNIX platform, no special software was required to enable file access as in the first prototype. This proved to be an important point as it made the system much more stable, reliable and flexible.

5.3.4 Sending requests to the media space

For the prototype system we had to build communication utilities to allow the AVSA to communicate with the various media space systems. The reason for this was that there were no established utilities for the PC platform. Since the new system will now reside on the UNIX platform, and the main core of the media space is developed on the UNIX platform, we can use the established UNIX communication utilities for communication.

By using the established communication utilities we can separate the systems so that the AVSA will not be dependant on the reactive room subsystem(Figure 20). The former situation was acceptable for the prototyping stage, but the new setup is more logical. In this new configuration, the AVSA communicates control information to the IIIF system directly instead of through the reactive room subsystem as in the last stage.

Figure 20: The AVSA now uses the Ethernet to communicate with the IIIF directly instead of through the reactive room.

5.3.5 Two-way communication

The Indy has much more processing power than the PC and the UNIX platform has many more processing capabilities than Windows. This should simplify the communication and make it much more stable.

In the last stage, the AVSA could send requests to the media space and receive responses based on the these requests. To enable this on the Indy the translator of the second stage had to be ported to the Indy platform.

In addition, there were some cases in which we wanted the media space to send requests to the AVSA without being requested to in the first place. For this reason we decided convert the AVSA communication layer into a daemon. In this way, after some simple modifications, other media space systems would be able to communicate with the AVSA at will.

5.3.6 The ASR

The process of integrating the ASR would be the same as the process in stage two because there was no API. However, since we were using a Silicon Graphics Incorporated (SGI) ASR package, and we had close ties with SGI, we were planning on working closely with the developers of the product to develop an API and tune the ASR to our needs. Unfortunately, the project under which the ASR was being developed at SGI ran into problems and was consequently terminated.

We did search for other ASR options. Vendors were contacted and the options were evaluated. The reason that we decided not to purchase a commercial ASR is that when we took into consideration:

the cost of the technology we required (which averaged approximately $4000 U.S.),
the time it would take to integrate the software into the AVSA and,
the impact on this research,

we felt that it was not sufficiently beneficial.

5.3.7 The interface

One major task of this stage was to port the interface from the PC to the Indy. We first salvaged as much of the logistics as possible from the Windows version. This basically meant that the translator and the logic behind the visual prompts were preserved. The actual visual prompts had to be reimplemented in tk/tcl. A new video overlay system also had to be integrated into the system using the new overlay hardware and software.

5.4 Enhancements

At the end of stage two, the system was evaluated by several people (as described in Section 4.5). From their comments we decided to make various enhancements to the AVSA. The enhancements can be divided into two categories:

improving the interface
adding functionality

The following sections present a detailed discussion of these enhancements.

5.4.1 Improving the interface

Based on the evaluation of the AVSA at the end of the second stage, the evaluators felt that several aspects of the interface could be modified to improve the look and feel of the interface. The following sections detail these enhancements.

Translucent background

In the last stage we removed the background box to make the options list less intrusive. Although our evaluators agreed it was less intrusive than the previous iteration, we observed that when there were few options people using the system did not always realize that options were showing. We observed this situation when there were few options available to the user. For example, when a visitor connected to a room with no services, the only option was to disconnect from the room and go to the last menu. In this case the interface was not sufficiently visible. In essence the interface was becoming a part of the media space, so much so that the visitor would not know their options.

The video overlay technology we were using gave us control over opaqueness of images. We used this to our advantage and solved the problem by presenting the list of options in a translucent box (Figure 21). This way there was always a more visible hint as to what was part of the interface and was part of the media space view.

Figure 21: The translucent background helps identify the option list.

Commands and captions

Even though quotes helped users distinguish between the words to be uttered and the captions, users still felt that an additional feature should be added to make the distinction clearer. As a result, we decided that the words that are to be uttered will, in addition to the quotes, be yellow with a black outline and the captions will be white with a black outline. In this way the function of the quotes is to signify what is the caption and what is to be uttered, while the color difference serves to make the distinction between the two clearer.

Location of options list

In the last stage the list of options was located on the bottom right hand corner of the visitor's monitor. This location presented a problem which will now be discussed.

The area occupied by the list was variable. The height depended on the number of options that were available, while the width depended on the string length of the longest caption. It was this variable nature which caused problems for users. Users wanted to be able to scan the list quickly, choose an option and find the associated word to utter. The constant horizontal and vertical shifting of the list, and thus search for the beginning of the caption as well as the first caption, seemed to slow users down as well as irritate them when they were going through several levels of the menu hierarchy.

From this analysis we concluded that the top left corner of the screen might be the best location for the list of options. This way the users would always know where to start scanning the list. This however was also found to be unacceptable to users. They found it too intrusive because they also scanned the view of the media space from left to right, top to bottom.

It was quite obvious from users comments that the best solution was to place the list in the bottom left corner of the screen. This eliminated the horizontal shifting. Vertical shifting still occurred, but based on our experience of using the interface and the comments of people we asked to evaluate the system, by exposing them to the four possibilities, it was found not to be as troublesome.

Feedback

All of the people asked to review and evaluate the system liked the idea of using a banner to provide feedback. However, they also felt it would be more effective to indicate what option is chosen by somehow highlighting the option in the list instead of presenting a message like "Responding to: 1. Call people" in the banner. As a result, we implemented a flash mechanism. When the user picked an option the color of the inner part of the outlined text would change periodically. Alternating between the color of the outline and the original color (Figure 22).

Figure 22: We accomplished flashing by periodically switching between the oulined text, shown on the left, and solid text, shown on the right.

This feedback is more direct. It also reduces the number of types of feedback that are presented through the banner, thus reducing the cognitive load on the user. In short, the feedback was now presented in a much more effective way.

Speeding up navigation

When a visitor visits the reactive room the AVSA queries each daemon for available services. Depending on the traffic over the LAN and the activity of the reactive room daemons, this querying process sometimes took a long time. This left users in a state of confusion wondering if the system had, for example, crashed. To rectify this situation we decided to implement a caching mechanism. With this caching mechanism if the AVSA detected a change in the reactive room it would reload all of the services that belong to that room into data structures. If, on a visit to the reactive room, no change is detected in the reactive room the AVSA does not query each daemon for its services. It simply accesses the data stored in memory, thus eliminating the confusing delay.

This caching made the system run much more smoothly and thus proved to be extremely valuable. There were other situations in which the system seemed to get stuck, but there was nothing that could be done about them. The reason they were getting `stuck' is because the media space was slow in responding to requests. The only way to truly fix the problem is to speed up communication within the media space. This is not in the scope of this thesis. Unfortunately, there was no way to work around the problem either because the answer to the request was crucial.

Improving navigation

Some of the users commented that they sometimes felt "lost" when they navigated through the hierarchy. They felt that once they had moved around the media space they no longer had a sense of where they were and how to get to where they wanted to.

To resolve this problem we decided to implement a map as a navigational aid (Figure 23). The map summarizes the menu hierarchy through generalized captions so that a visitor can quickly see where they can go and how to get there. It also indicates the current position through a more specific caption and a different color. To help the visitor further, the current location was displayed as the first line of the option list (Figure\x1123). In order to ask the AVSA to display the map the visitor must say the word "map". This word acts as a toggle switch. Consequently, when "map" is uttered again the map will disappear.

Figure 23: The map helps users navigate through the menu hierarchy. The first line of the option list helps the visitor pinpoint their location on the map.

Directing speech

In the first prototype, we implemented a simple protocol by which the AVSA knew who the visitor was talking to. Unfortunately, we discovered that the same ambiguity was being experienced by the members of the media space. The problem was that when a visitor was issuing commands to the AVSA they were also disturbing the meeting. It was not so much a problem if it was a one-on-one meeting or even a three person meeting, but with four or more the meeting was more like a presentation and it was disturbing to members of the local media space to have to listen to a person issuing, what seemed like, irrelevant comments.

There are three possible solutions. All involved providing a signal by which the media space knows to mute the audio going to the meeting room, but not the AVSA. The first is to provide a mute button of some sort at the orphan CODEC site. The problem with this solution is that it does not conform to our LCD quality. The second solution was to require the user to use some hand gesture. Unfortunately, this solution is not self-explanatory or easy to use. The visitor would have to learn a special gesture and make sure that they do not use it during the course of the meeting. It may also be distracting to the people in the meeting to see someone performing what seems to be irrelevant gestures. The third solution was to provide the visitor with the command "mute". When this command is issued the audio to the meeting room, but not the AVSA, was disconnected. When "mute" is said a second time the audio is reconnected. The problem with this solution is that the members of a meeting will still hear "mute" at least once. However, we decided to go with this final option and evaluate it further before considering one of the others.

The help and introduction screens

There were two elements present in most interfaces that were still missing from the AVSA interface. The first was a help screen (Figure 24). The interface is quite self-explanatory. However, we have added two new commands (mute and map). The existence of these commands are not indicated anywhere. We could have displayed them on the visitors screen all the time. This would have been intrusive, especially since we would have to explain their function and usage. We also needed a way for the visitor to know about the "show menu" and "hide menu" commands. As a result, a help screen was added to the interface. To make the help screen visible/invisible the visitor must say "help". Since all the other commands were toggle commands, we also decided to make "show menu" a toggle command so that "hide menu" is no longer needed.

Figure 24: The help screen explains how to use the AVSA.

The next addition to the interface was the introduction screen (Figure 25). Originally when the visitor connected to the media space they saw a list of options overlaid on top of an appropriate sign. This was not very inviting. Instead, we redirected the video seen by the visitor to be that of a live view of the University of Toronto. We then overlaid a greeting on top of this image that explained to the visitor where they had connected to, how to get help and the options that were available.

Figure 25: The introduction screen.

In summary, this section explained the many new interface features we added and changed, based on user feedback, to make the interface more effective. The next section describes the features added to the system to make it more useful.

In this situation the CODEC visitor would be able to complete a connection to the room but they would not be able to access the room services. This stage addressed this problem so that as long as there was a node available in the room, the visitor would be able to access any of its services.

Video on demand

So far, the only advantage we have taken of the television technology in our effort to converge technologies is that we are exploiting the richness of the medium through which it is being transmitted. However, as mentioned in chapter 1, the concept of the television as a method of distributing information at any time is also very valuable.

For this reason we decided to implement a video-on-demand (VOD) service. With this service visitors can access pre-recorded material at any time. VOD also enables members of the media space to allow visitors to access whatever material they wish them to access.

Not only can visitors access the information, but they can pick what they want to see from a list of demos (Figure 26).

Figure 26: The user has access to various demos provided by members of the media space.

The AVSA also gives them such control as play, stop, pause, fast forward, fast forward speed, etc. (Figure 27).

Figure 27: The user has access to various demos provided by members of the media space.

5.5 Summary

During this stage we moved the AVSA from the PC/Windows platform to the SGI/UNIX platform to make the system more stable and expandable.

We used comments provided by users at the end of the second stage to improve the overall look, feel and effectiveness of the interface by improving and speeding up navigation, improving feedback, making the interface more visible, introducing the system, etc.

New functionality in the form of video-on-demand, access to other rooms with services, and access to multiple visitors within a room were added to improve the quality of the visit to the media space.

Contents

Anuj Gujar's Home Page