Using Salient Lenses for Adaptive Image Display

Using Salient Lenses for Adaptive Image Display
A Project for Topics in Mobile and Pervasive Computing

Second Project Report - November 18, 2008:

Implementation
Seam Carving
A Practical Application
Future Directions

It was decided after some research of the alternatives, that an implementation written in C++, as a Win32 application using GDI+/DirectDraw libraries would be ideal.

C++ - C++ code runs fast, and is familiar. I had concerns that the overhead of using C# and the .NET framework (while perhaps being easier to write, and less of a headache in some ways) would not perform as quickly. The main priority is performance, as my approach and solution may involve relatively costly algorithms on bitmaps.
Win32 - For those who have experience doing Win32 development on a typical Windows OS (like Windows 2000 or XP), programming an application for Windows Mobile will seem very familiar. At the WinMain() entry point, set up the window as you normally would, and provide a pointer to another function to handle events.
GDI+ - Originally, I had started the implementation using solely the GDI+ library. However, this proved to be quite slow for bitmap manipulation, it took over 20 seconds using GetPixel()/SetPixel() commands on a 320x240 bitmap to produce another image with its gradient magnitudes. While a faster method to access the bits of a bitmap exists (GetDIBits()), this function is not supported on Windows Mobile 5.0. So, this library finds practical use for loading bitmap images, but little else.
DirectDraw - I discovered that the DirectDraw interface would allow direct access to the bitmap data, skipping costly operations like colour conversions between device contexts, etc. Also, common graphics techniques such as blitting and double-buffering are implemented to utilize the system's hardware, when supported. While I hadn't used DirectDraw (or even DirectX) before, I was able to learn and get started using the library within a couple of days. The performance improvement was significant, working with bitmap data directly improved the time to determine the gradient magnitudes of the same 320x240 image from over 20 seconds with GDI, to under a fraction of a second.

Seam Carving:

The Seam Carving paper, the first paper presented in the related work section of the previous report, provides a simple approach to shrinking the dimensions of images, while preserving image content. I implemented this technique as a potential starting point for my solution.

Given an input image, the first step is to calculate the "energy" at each pixel position in the image. This is a sum of the absolute values of the partial derivatives. In my implementation, I have used backward-differencing to evaluate these partials quickly, as it is cheaper than the Sobel filter for example (requiring only two points instead of six for each direction). We evaluate the following at each discrete pixel position:

Next, we determine the 8-connected path across the image, whose sum of energies along each pixel in the path is minimal. That is, we want to remove the seam s* such that

To determine this minimum seam, we first employ a dynamic programming approach. The minimum cumulative energy M at position (i,j), where i and j denote the ith row and jth column in the image, can be expressed as:

The above expression populates each row of M downward, determining the minimum cumulative energy at all pixel positions in the vertical direction. The approach to work horizontally is analogous. Note that since we are interested in 8-connected paths, we branch downward using only one of 3 possible directions at each position (represented by the 3 terms within the min()). We show values of M in a gradient that shifts from blue (zero), to green, to red (the maximum cumulative energy in the image), as was done in the paper.

To trace the minimal energy path, we first determine the position along the bottom row of the image with minimal cumulative energy. We then trace up each row, moving one pixel to the left or right, or straight up, based on which position has a smaller cumulative energy. We show these paths on the examples shown above.

By iterating this approach, removing seam after seam, we can reduce the image dimensions while attempting to preserve the salient region (which we assume here are those regions with high gradient magnitudes).

A Practical Application:

There are two applications of interest that ship with Windows Mobile 5.0 - one called "Pictures & Videos" and the other "ClearVue PDF". As a short-term goal within the time constraints of a course project, I would like to produce an application that performs the same function as these applications - to view and navigate through bitmaps. (Note that in the case of the PDF format, which requires conversion to a raster graphics image, I would instead work with a conversion of the PDF as a series of bitmaps of the corresponding pages.)

Naturally in the case of viewing documents, I am interested in how well the seam carving approach will work on shrinking blocks of text, as one potential application.

While readable, the above result is not all that satisfying. Removing the whitespace between words reduces overall readability. Further, it can be observed that some seams have penetrated through some of the letters (in many cases making them appear italicized) resulting in an inconsistent appearance.

I realized that a simpler, alternate energy function could be used in the case of text - whitespace would represent zero energy, and black would represent the maximal energy.

As can be seen, this aided a great deal in preventing the seams from crossing through the letters of the text, resulting in text without visual artifacts. However there is still an issue of inconsistent spacing reducing overall readability. Below we show that this is less of a problem when removing whitespace between adjacent lines of text, because of its structure.

Future Directions:

To produce better results on text with the seam carving approach, we can take advantage of knowledge of its structure. Rather than removing vertical seams globally, it may be better to partition the bitmap into distinct lines, and work on shrinking each piece independently. This idea is somewhat obvious, as it is essentially using seam carving for the purpose of text justification. I would anticipate increased performance as another advantage of this piecewise approach. I have yet to implement this idea.

Another direction I would like to make significant progress on which I have got started with is using the seam carving approach on significantly larger images (greater than 1000x1000), while keeping performance on the mobile device as close to real-time as possible. The major challenge is that as the seam carving approach works iteratively, removing only one seam of pixels at each iteration, for large images reducing the width or height by 1 pixel at a time is an inherent weakness. The second challenge is that as the dimensions and hence the number of pixels in the image increases, so does the computational cost at each iteration.

To overcome this fault, I have experimented with using a multi-scale approach, determining the optimal seam on a smaller image, and then scaling that seam upward and removing it from the originally sized image. This scaling approach allows each seam to be determined in less time, at the expense of being an approximation of the true minimum seam in the source image.

The above result shows the source image (top), the standard seam removal approach (bottom left) and the multiscale approach (bottom right). A few artifacts along the edges of the face and shoulders are noticeable when using the multiscale approach.