email: giachino@gdp.utoronto.ca, giachino@mailer.cefriel.it
CEFRIEL - Via Emanueli, 15 - 20126
Milano - Italy
Work performed while visiting the Telepresence Project at the University of Toronto
Given this architecture, it is interesting to investigate the automatic extraction of information from the environment through the images collected by Portholes. This information can be used for issuing notifications of events concerning persons' presence and availability. In this way the Portholes architecture becomes a mean for performing activity sensing on the environment, thereby providing a bridge between passive awareness and active awareness.
In order to provide active awareness, in general the kind of activity we are interested to monitor is human activity, so that we can issue notifications concerning relevant people's actions, like leaving their workspace, logging out from the iiif server, changing the state of their offices' door. Among these possible actions, the person's presence in front of his or her camera can be detected by properly comparing sequential images grabbed from that camera. Other actions can be more easily detected inquiring the iiif server.
In our media space one of the reasons for using the Portholes images for sensing human activity, instead of other environment sensors, is that the cameras, the wiring and the iiif server are already there. Another reason for using images provided by cameras is that we can expect in the next years an increasing production of personal computers equipped with cameras, together with speakers and microphones.
In order to detect the presence of human activity using the Portholes images, I wrote a software module, called CmpFrames, that given two consequential frames of the same scene guesses if there is or there is not human activity. CmpFrames has been designed to be very quick in providing a guess, but it provides only a 'guess'. For the sake of execution speed, it assumes that the scene has some specific properties, like a stable background, for instance. If these properties are not met, then the guess can be wrong, and the sensed activity can be of other nature (a screen saver, for instance).
CmpFrames is the fundamental brick upon which I created a personal notification tool, called Monitor, that provides notifications of people's presence and availability. It has been mainly written for testing the CmpFrames module, but it turned out to be an interesting personal notification tool.
At the time of this writing, the CmpFrames module has not yet been integrated in Portholes. The images are gathered by asking the iiif server to connect people's cameras to a frame-grabber. Users privacy is guaranteed by the access control performed by the iiif server.
The following two sections describe the concept of active awareness and the CmpFrames algorithm. If you are mainly interested in testing and using the Monitor program you can skip these two sections and read the next one, The Monitor Program session, that explains the purposes of the program and how to use it.
In terms of communication involving human beings and computers, Portholes can be considered as a system that allows HH communication in the background. the following figure shows a conceptual frame (borrowed from Professor Bill Buxton) for describing the possible kinds of communications we find in systems that involve both human beings and computers (eventually connected by networks). In this frame the Portholes system falls in the HH background communication category, since it allows human to human communication performed without explicit users' actions.
Figure 1: The Communication Arena
In addition to investigate the single areas, it is also interesting to explore the possible migrations from one area to another. The CmpFrames software module provides the path for one possible migration, which is shown in the next figure. There we see that by means of the CmpFrames module, Portholes detects on behalf of a client that a user is now available (HC communication in the background), alerts the client through the GUI asking if he or she wants to call a video meeting (HC communication in the foreground), and finally a video meeting is called (HH communication in the foreground).
Figure 2: A Possible Path from Background to Foreground HH Communication
According to this scheme, whilst the Portholes system provides passive awareness, that is awareness gained in the background, the CmpFrames software module and the Monitor program constitute the means for achieving active awareness, that is awareness gained in the foreground. But the question is: once we enhance Portholes for providing active awareness, how will we consider it? A more general awareness support tool or a mixing of somehow unrelated facilities? Moreover, we could experiment people reluctance in having their images periodically distributed among other users (Portholes), but not as much as for active awareness purposes that do not imply the distribution and public accessibility of images (CmpFrames and Monitor).
The cameras we use provide colored video sequences, but for our purposes we consider only gray-levels images (256 gray levels). This choice is partially due to the fact that Portholes handles gray-level images in order to reduce the network overhead. These images are pretty small (240x160), but convey enough information for the human activity detection.
The rate at which the images are provided can range between one every 30 seconds to one every 5 minutes. Portholes generally provides images at the rate of one every 5 minutes. The rate has important implications on the design of the algorithm for the activity detection, as it is described later.
In order to have this approach effective, we assume that when a person is not in the view of the camera, the sequential images are very similar one to the next, that is, the background does not change. This is obviously a strong limitation, since in the view of the camera, as already mentioned, it is likely to have moving objects other than the person we want to monitor. Moreover, even in the presence of a very stable background, the hardware limitations are likely to produce consequential images that are not equal pixel per pixel.
The hardware limitations produce two different effects on the images. The first one is the presence of spikes, that is isolated pixels that have a very different gray-level from their neighborhood. The second one is that even with the same scene pixels in the same positions can have different levels of grays in two consequential images.
The algorithm we propose filters the images using a convolution mask that actually smoothes the discrepancies. This process reduces the incidence of the spikes, and cleans the image producing more uniformity between images of the same scene.
After this smoothing process is completed, the algorithm begins to compare the images. During the comparison of pixels the algorithm uses a proximity level, or error threshold, for deciding if two pixels are to be considered equal or not. This error threshold helps to reduce the incidence of the hardware limitations.
The error threshold is a parameter of the algorithm that can be tuned in order to accommodate the hardware limitations. If the hardware is of good quality, then the error threshold can be lowered. The algorithm requires an error threshold expressed in percentage. It then calculates the absolute error threshold by: 1) extracting the maximum gray level from each of the two images, 2) choosing for the lower one, 3) applying the percentage to this level of gray. In this way we use the worst conditions for calculating the absolute error threshold, and we recalculate it for every pare of images.
The distance of the action from the camera is also important. Since we use a percentage of changed pixels for detecting action, as the action is smaller in the images (because the people is far), the percentage of changed pixels is lower. In this case the quiet threshold should be lowered accordingly.
Finally, the brightness of the scene effects the amount of noise introduced by the camera. As the scene becomes darker and darker, the noise introduced by the camera increases, and a higher error threshold might be required.
As appears from this description, the thresholds are very important in order to have the algorithm working properly under different conditions. Even if the best solution would be to have adaptive thresholds, good performances can already be obtained by using personal profiles in which the best empirical thresholds for every person we want to monitor are stored. We will better describe this idea in the Future Work session.
A good procedure for setting the thresholds is the following. The first threshold that is to be set is the error one. In order to accommodate the hardware limitations, you grab several images from a very stable scene, and tune the error level so that the amount of changed pixels that is detected by the CmpFrames algorithm is lower than 2%. Than you grab images with a person in the scene who tries to move really little, and set the quiet threshold just under the average percentage of changes pixels that the algorithm detects. In the session that describes the Monitor program this procedure will be explained in more details.
Usage: fcmp frame1 frame2 [-e error_level][-q quiet_level][-d diffFile][-s][-v]
Exit status:
This program can be compiled with an ANSI C compiler, like the gcc compiler.
Monitor can also work in a silent way, providing only return codes that can be used by other shell script and/or commands for further processing. This working modality can be used for building other personal tools on top of the Monitor program. In the package are included two examples of this kind of personal tools: Areyouthere detects if the person is there or not, whilst Whenarrives returns only when the person has returned or if activity is detected. These programs are very simple shell scripts that exploit the silent facility provided by Monitor. The main difference between using these programs and using directly the Monitor program is that Areyouthere and Whenarrives abort once they can provide the notifications, whilst the Monitor program can be run for continous monitoring purposes.
Originally, one of the main purposes of the Monitor program was to allow an easy testing of the Fcmp software module before integrating it in the Portholes system, but it turned out that it can be used as a useful standalone tool completely independent from the Portholes system.
Monitor relies on the iiif server for connecting users' cameras to a centralized frame grabber. By means of this interaction with the iiif server, the program is able to detect specific conditions related to the state of the person's Telepresence Stack. This possibility enhance the set of the possible notifications that the program can issue.
Monitor can provide two main kinds of notifications. The status notifications can be considered the recognition of a state (and not of a change of state). These notifications are issued when there is the lack of memory on a previous state. For instance, when the program is executed for monitoring John, and John is in his office, the program reports a status of activity detected. Clearly, this is not an event. As soon as a change of state in the scene is detected, an event notifications is issued: John has arrived, or John has closed his door, or John has logged out.
The following is a comprehensive list of all the categories of notifications that Monitor can issue.
EVENT) A change of state has been detected (person has arrived
or has left, has closed or opened his or her door,
has logged in or out from the iiif server)
STATUS) A status is reported only when the available information
is not enough to detect any change of state (for
instance, at the beginning of an execution the first two
frames are enough to say if there is or not activity,
but not for detecting a change of state since we do not
have background information yet)
DEBUG) Debug warnings issued only if option -d is indicated
on the command line
ERROR) An error condition occurred. The program aborts
Once a frame has been grabbed, Monitor cleans it using the PbmPlus tool pnmconvol. Then, if a previous frame is available, Monitor calls the Fcmp program described in the previous section. Fcmp performs the comparison between the two frames and detects if there is activity. Monitor uses this information and compares it with the previous result, if any, in order to guess what happened on the scene.
Because of the way the absence of activity is detected by the Fcmp program (that is, we have no activity when two sequential frames are very similar one to the next), it turns out that the Monitor program is late of one frame in detecting when the person has left (think about it). Anyway, this is not a big problem since the most used feature should be the notification of people's arrival.
README A readme file
doc The directory containing documentation files, like
monitor.mcw A MacWord version of this report
monitor.rtf A RTF version of this report
monitor.txt An ASCI version of this report
monitor.ps A Postscript version of this document
fcmp Compares two frames and detects activity
(to be manually compiled on a different architecture)
grab Grabs a frame from the local frame-grabber
(to be manually compiled on a different architecture)
monitor The Monitor tool
areyouthere The Areyouthere tool
whenarrives The Whenarrives tool
pnmconvol The PbmPlus program for applying convolution
filtering to frames
pnmcat The PbmPlus program for merging different frames in one
pnmscale The PbmPlus program for scaling frames
fcmp_src The directory containing the sources for the fcmp program
grab_src The directory containing the sources for the grab program
In order to install the tool, you need a SunOS 4.1.3 1 compatible workstation, at the binary level. Once you have untared the package, you might want to edit the header of the shell script 'monitor' and change the default configuration. If you already have an iiif client program, all you have to do is to compile the fcmp program, if the workstation is different from the one mentioned. The sources and the Makefile of the fcmp program are stored in the fcmp-src directory. If you do not have an iiif client program then you have to look for it. As regards the grab program for controlling the frame-grabber, the binary version (grab) is provided with the source code (grab.c) and Makefile.
To summarize:
For a quick run, digit:
monitor <myself> 60
In order to monitor yourself every 60 sec (please, substitute <myself> with your login name). If you want to monitor an iiif node instead of an iiif person, digit the following:
monitor <node_name> 60 -n
The syntax of the program is the following:
Usage: monitor <name> <delay> [-n] [-e error_level] [-q quiet_level]
[-d ] [ [ [-x [address]] [-c][-m] ] | -M < STATUS | EVENT > ]
name Name of the iiif user or of the iiif node
(see option -n) to be monitored
delay Delay in secs between frame grabbings
-n <name> indicates a 'node name', and not a 'person name'
error_level Percentage of error in comparing pixels (3% per default)
quiet_level Percentage of "quiet level" (11% per default)
-d Print debug information. The log files are not erased
-x Display last frame on X display [address]
-c If -x set, display also the previous and the
changed-pixels frames
These frames are displayed below the last frame
On the changed-pixels frame the black pixels are those
that are considered changed (according to the error
level)
-m If -x set, the frames are displayed magnified
-M Returns only STATUS or EVENT code, no output is provided"
Useful in shell scripts"
STATUS - Provides status return codes:"
0 No activity detected"
2 Activity detected"
3 The door is closed"
4 The door is locked"
5 User not logged in"
EVENT - Provides event return codes:"
6 User has left"
7 User has arrived"
8 User has opened his or her door"
9 User has logged in"
10 User has closed his or her door"
11 User has locked his or her door"
12 User has logged out"
monitor returns 1 if something goes wrong"
I tested the program with different offices, light conditions and grabbing rates. I experimented successful guessings most of the times. I noticed a performance decreasing in very dark environments. Moreover, it turned out that if the environment to be monitored is completely dark (the light is off and no windows, for instance), then the program can provide very bad guessings.
The Monitor program is in use at the Ottawa Engineering Group. They have a very different environment, and according to their comments the program seem to work pretty well. They also integrated the Fcmp program in Portholes and the initial results are positive.
The Monitor package can virtually be used by anybody. The current limitation is that it must be run on the iiif.dgp workstation, since that is the only workstation that currently has a frame grabber connected to the iiif server. But not every body here has an account on iiif.dgp. In the next session a proposal is made for solving this problem.
An interesting enhancement would be a central repository of the threshold profiles for all the offices of the media space, so that the users of the service do not have to think about the thresholds. This profiles should be mapped to the nodes of the media space, and not to the users.
In a future version of the product the privacy of the monitored person in terms of not showing his or her images on the display could be addressed. People might better accept the tool if they know that the images are grabbed only for detecting their presence.
The current limitation of having to run the Monitor program on the iiif.dgp workstation can be easily relaxed by creating a small server on that machine that provides grabbed frames on demand. If such a server exists, then the Monitor program could be run on any workstation of the network.
Finally, the Monitor program should be translated in C for the sake of execution speed and for having less programs to manage. The PbmPlus facilities are available in a library that can be linked to C programs.