CHAPTER 6 Future Considerations and Final Words

This chapter presents many issues that need to be addressed by future efforts to develop the AVSA. We also summarize the work and present the major contributions of the work.

6.1 Future Considerations

We would like to see the AVSA deployed into the field at some point in the future. The development and testing of the AVSA system has raised a number questions which could not be fully addressed within the scope of this research. However, they are issues that will have to be dealt with in order to make the system deployable. We therefore present these issues and possible solutions in the following sections.

6.1.1 Privacy

Reviewers of the system pointed out that they would not like people to have the ability to "roam around the media space, visiting offices indiscriminately". They felt this was a violation of their privacy[Buxton 1994][Gaver et al 1992][Mantei et al 1991]. Let us look at the current situation through an example.

Imagine the case where you are being visited and someone is trying to contact you through the AVSA. Currently, if your node is busy with a visit from another electronic visitor, the new visitor is not allowed to connect to you. This is fine if the node is busy. In this case, a "node is busy" reply is sent from the media space to the AVSA which then provides a message in the banner to the visitor saying "The person/room you have called is busy. Please try later."

However, what if your node is not busy? The TP application allows one to specify whether or not electronic visitors can connect to one's room. It uses the a door[Buxton 1995][Moore 1994] metaphor whereby an open door means allow electronic visitors to enter the room and a closed door signifies that the room is not open to electronic visitors. There are four possible scenarios that could occur with an AVSA enabled media space.

The first is that your door is closed on purpose. In this situation the media space sends a busy signal to the AVSA which relays this message on to the visitor. Hence, the visitor can conclude that you are busy and he/she will try again later. This situation is acceptable as it enables privacy and the visitor is satisfied in knowing that you are busy.

However, what if your door is closed by mistake? Once again the media space sends a busy message to the AVSA. The visitor erroneously concludes that you are busy, when in fact, for example, you have simply forgotten to open the door after closing it earlier when you did not want to be disturbed. This situation is inappropriate. Privacy is valuable, but so is the ability to communicate on demand. The effectiveness of the system is hampered by situations like this because both you and your visitor are missing the opportunity to communicate valuable information.

The third scenario is if your door is open on purpose. If you are willing to accept a visit from anyone or the person you are waiting for a visit from connects to you then there is no issue of privacy. However, if someone other than the person you are waiting for calls first you must decide if this visitor's visit is more important. It is entirely possible that it is not, but during this decision process the visitor you are waiting for may call and may be turned away by the system.

The fourth scenario is that your door is open by mistake. This situation is very dangerous as it could lead to leaking of important documents, embarrassing situations, meeting disruptions, etc. Privacy in this scenario seems to be the major concern of reviewers of the current AVSA.

Taking into the consideration all of these problems, it seems that the best solution would be to borrow the proven protocol used in telephony. In short when a call is made to you through the AVSA, the AVSA instructs the media space to send a special signal to your node that notifies you that someone wants to meet you. If you are in your office you can accept the call. If not the ringing will continue for a pre-specified time. Once the ringing has timed out the media space will send the busy signal to the AVSA.

A further advancement on this system would be to implement the equivalent to a "call-display" like feature. For example, a view of the visitor could be routed to you so you know who is trying to visit you. Some visitors may take exception to this as people are more comfortable with being able to see what can see them. Therefore, an alternative solution is to prompt the visitor to provide some audio, like saying their name. This audio would be sent to you in the form of, for example, "ring - incoming visit from `Bill Buxton' - will you accept?". This way you would have the information necessary to decide whether or not to receive the visit. At this point the system could even be designed to wait for a verbal "yes" response from you to accept the call.

This solves all of the problems mentioned earlier, but it introduces one that is not in the current system. That being in the current system you do not have to make any special effort to receive calls. To preserve privacy there is no way to totally eliminate this inconvenience. However, we can propose a method of specifying situations that will allow members of the media space to reduce this inconvenience.

6.1.2 Different levels of access

It is clear from the discussion on privacy that we need some way of specifying different levels of access. It makes sense to base this specification on social proximity, because closer people have different rights than people who are further away. It also makes sense to base the accessibility on the situation. For example, a meeting may have a different accessibility rule than visiting a personal node. Unfortunately, the only person who can truly define the level of closeness and can evaluate the situation, in our context, is the person of the media space who is being visited. This is essentially the situation in the caller-id system which was previously discussed. The person being visited evaluates their current situation and makes a decision as to the importance of a call from a visitor based on social proximity.

One way to address this situation is to use a human receptionist to filter calls as they come in. The receptionist will take the appropriate action depending on the situation and social proximity. By and large this non-technology solution will work, but it is interesting to note that the receptionist will be basing the decision on a model they have learned through experience.

Our goal with respect to this issue is to narrow the possibilities down as much as possible and construct a simple model, similar to one that a human receptionist may use, upon which the AVSA can determine whether or not the visitor should go through the caller-id system. The criteria which we will use in this accessibility model are obviously social proximity and situation.

There are two main social groupings of people who can be detected by the AVSA. The CODEC visitors and the TP application visitors. In the current system there is no way to distinguish one CODEC visitor from another. On the other hand internal media space users can be distinguished from each other much more easily based on the their node id (we are assuming, perhaps dangerously, that only the person logged on to the node will be using that node). Hence, internal users can be further divided into social groupings. It should be noted, however, that as speech technology develops it will be possible to distinguish one CODEC caller from another using speaker verification and/or speaker identification technology. This technology will also help enforce our assumption that, for internal media space users, the person logged on to a node will be the person using the node.

Imagine a graduate student specifying level of access based on his/her view of social groupings. A graduate student may require five groupings. One for CODEC visitors, one for students, one for people he/she is supervising, one for peers and one for his/her supervisor. The graduate student must place these groupings into levels that determine the accessibility of the graduate student's node. These levels could be:

CODEC visitors may be assigned to level 2, which means they must go through the caller ID system before they can complete a connection. On the other hand the graduate student could assign her/his supervisor the highest level, 4, meaning they value the opportunity of the meeting so much that the supervisor is always welcome. Level 3 could be assigned to people being supervised by the graduate student and so on. This type of grouping system could be made available to the owner of the node as properties which can be set so that in certain situations the node owner could define who should be allowed what level of access.

The first example dealt with assigning access levels based of the social proximity relative to the owner of a personal node. However, what if the node is a meeting node? For this type of node it does not make sense to assign social groupings. The room is meant to be a shared space where anyone can meet and exchange ideas. Level of access will therefore be determined by the situation at hand. Once again there are some distinct situations that can be identified. These situations could be used as a description of the level of access to the room. For example the description could be:

a highly confidential meeting - meaning only physical attendees are allowed.
an internal meeting - meaning must use caller ID.
an open meeting- meaning allowed to glance and enter at will but must be introduced upon entry.
a demonstration or nothing going on - meaning enter at will, no introduction required.

Once again, these situational specifications can be provided as properties of the room to be set when a meeting is to take place.

In summary, the settings of the two criteria could be made available to the nodes of a media space so that intelligent decisions can be made about the level of access people should have. The AVSA could even be developed further so that it allows the visitor to make intelligent decisions on how to enter the media space. For example, if the visitor is looking for someone and they decide to visit nodes asking for the person. The visitor would have reservations about entering meeting rooms and possibly disturbing a meeting. By being aware of what is going on in meeting room the AVSA can inform the visitor of the situation by displaying a message like "This room is being used for an internal meeting. Do you wish to wants to enter? Yes or No". The visitor can then make an intelligent decision as to the importance of their need compared to the situation in the meeting room.

6.1.3 Video Mail

In dealing with the privacy issue we have uncovered a functional deficiency of the system which should be addressed. In all situations where the visitor attempts to connect to a member of the media space and is turned away they must try to call again later. There is no way to provide information to the member of the media space that will allow him/her to pass on a message like who called, how they can be reached and when they can be reached. Nor is there a way for an orphan CODEC user to simply leave a message for someone at a media space.

To address these problems a video mail feature should be integrated into the system. It would be invoked in two ways.

If the visitor tries to call a node that is busy. Thus, the video mail acts as an answering machine.
From the main menu if the visitor just wants to leave a message for a member of the media space. In this case the video mail acts as an e-mail system.

6.1.4 Directing speech

The question of how to direct speech to the AVSA and how to direct speech to the media space needs to be addressed further. Thus far the AVSA knows if a visitor is talking to it based on whether or not the menu has been activated through the "menu" command. If the visitor has connected to a member of the media space they have the option of muting the audio going to the media space first by uttering "mute" so that they do not disrupt meetings.

Unfortunately, muting the audio in this way is still too intrusive to large presentation-like meetings in the media space. A non-auditory signal is required. The two other possible solutions we presented earlier were to allow the visitor to use a gesture to mute the audio or to give the visitor access to a mute button of some sort. The former solution forces the visitor to learn a gesture that may or may not make sense or even be appropriate (depending on cultural differences). It also requires that the visitor be aware of his/her position relative to the camera. Finally, it may disturb other meeting attendees.

The solution seems to be to provide a mute button without adding any extra equipment at the orphan CODEC. The best way to do this is to write a simple audio monitor that monitors the level of audio being received from the CODEC. When the visitor wants to talk to the AVSA without first disturbing the meeting, the visitor can turn the audio output of their microphone off first. The audio monitor will sense this and the AVSA will mute the audio going to the meeting room. The AVSA can then display a message to the visitor through a banner saying that the audio has been muted. Any audio from now on will be directed toward the AVSA. The visitor can then turn the audio output of their microphone back on and talk to the AVSA. When the visitor wishes to have their audio routed to the media space again they can issue the "mute" utterance to toggle it back on. The solution is quite natural, it is non-intrusive, and it does not require extra equipment. The only drawback is that it does require the visitor to have access to the audio level (i.e. volume control) coming out of their microphone.

6.1.5 Controlling Volume

In almost every meeting which is attended by an electronic visitor the electronic visitor, the electronic visitor has had to request the member of the local media space to adjust the volume being sent to the visitor. There is no reason why electronic visitors should not have the ability to adjust the volume themselves.The implementation would require some centrally located hardware and software control that communicated with the AVSA using the established protocol. Volume control would then be added as a service of the room through a room control system.

6.1.6 Speech Recognition

Up until now we have designed the AVSA for discrete-utterance, small-vocabulary, speaker-independent speech recognition. Because of the dynamic nature of the media space we feel that the discrete-utterance menu-guided approach we have taken is a very effective one. However, there are some situations in which performance and naturalness could be improved by using a continuous-utterance recognizer with word-spotting.

For example, when the user wants to see the menu it would be much better if the visitor could say any one of "Attendant, please show me the menu" or "Attendant, show me my options" or some other variation of the same command. By not restricting the user to a specific command it would allow users to converse with the system in a more natural manner. An added benefit is that erroneous recognition, due to similar sounding words, would be greatly reduced.

6.1.7 Increasing Access

In the past, access to a media space has been limited to members of other media spaces. Now, with the AVSA, people with orphan CODECs can access a media space as well as its services. While we have significantly increased the ability to communicate it should be noted that many people do not have the expensive equipment required to take advantage of this technology. However, by developing the system as a set of independent layers we have made it possible to extend accessibility to people who want to access the media space through some other means. Two possibilities are:

telephones
public access kiosks

Telephones

We know that video-enhanced communication improves the quality of the communication[Tang et al 1992] and thus telephone based communication will not be as effective in the context of videoconferencing. However, we cannot ignore the fact that telephones are one of most ubiquitous communication technologies available to us today- more than traditional videoconferencing sites and much more media spaces. For this reason, it is extremely important that once deployed the AVSA should be able to accommodate telephones.

The layered architecture of the AVSA does in fact make it possible for accessibility to be extended to telephones. Instead of using the video channel to present the options to the visitor, the options would be presented using the audio channel. This would be facilitated by adding a text-to-speech synthesizer to the AVSA. The AVSA would simply pipe parsed text from the media space into the text-to-speech synthesizer. The resulting audio would be sent as options to the visitor. Input would be conducted through the audio channel as before or using the standardized touch-tone keypads along with DTMF signals.

Public access kiosks

There are currently no public access kiosks that connect to a media space. It would not have made sense to provide them in the past because without the ability to navigate through the media space and control the media space it would not have been useful or cost-effective. The AVSA solves this deficiency and it now makes sense for kiosks to be made available to further increase the ability to communicate.

As mentioned before, the AVSA was developed in layers. So once again the interface layer can easily be changed to accommodate any other type of interface. For example, the kiosk may be in a noisy environment and work better with a touch screen interface.

6.2 Summary and Conclusions

We identified the lack of external access to a media space as a serious deficiency in media space communication. We propose a solution to the problem in the form of a system called the AVSA. The AVSA enables access and control of a media space from a traditional videoconference room.

In a larger context, we explored the concept of using various technology to facilitate communication efficiently and effectively. We note, however, that all these technologies have properties that detract from their purpose. We also note that to combat these deficiencies the different technologies are being developed so that they mimic properties of the others. However, because of the approach, they are limited by the limits of the technology within which they operate.

We proposed to explore "convergence" of technologies from a different perspective. One that captures the advantageous properties of each to reduce the disadvantageous properties of the others without centering the development around any one technology. For our exploration, we chose to augment a media space, the Ontario Telepresence Project, with a system called the AVSA.

One key issue is that we had no control over the videoconferencing room. We had to work with the equipment that was already there- a camera, microphone, monitor and speaker. The resulting interface enabled control to be exercised through speech input in response to speech prompts.

The second key issue was that if the work was to have any influence, as an example, on efforts to converge technologies we had to develop a useful system in a timely manner. As a result, we developed the AVSA through a thoughtful, three-stage, iterative process involving input, on an informal level, from users, developers and reviewers of varying backgrounds.

Issues of privacy, access, added functionality, etc., which will require attention before the system can be deployed into the field for public use, have also been discussed along with suggested solutions.

From the comments of people who have evaluated the system we concluded that we had significantly increased media space access and contributed to the discourse on networked interactive information appliances; thus, achieving our objectives.

6.3 Contributions

Functionally, the AVSA is significant in two ways. First, it increases media space accessibility by:

enabling people from traditional videoconferencing rooms to access the media space.
providing an alternative, and potentially easier, means by which members of other media spaces can access a media space.
providing an easily modifiable system, through its layered architecture, that enables other devices, like the telephone, to become the technology through which interaction can take place.

Second it provides an interface through which people can control media space resources that were previously not controllable through any interface, making the visit more effective.

The AVSA is also significant in the following ways:

it provides an example of an alternative option to the conventional perspectives on how to converge technologies.
because of the constraints rooted in our problem, we were forced to build an interface with unusual modalities. This work serves as an example of how to work with similar constraints to build an effective interface.

Contents

Anuj Gujar's Home Page