CSC378: Optimal Lists

Idea: Use a linked list to store elements but, every time an element is sought, move it to the front of the list.

Intuition: Frequently-sought, or high-probability, elements will tend to stay near the front of the list.

For example:

Searching in a Move-to-Front (MTF) list costs only a few pointer assignments more than searching in a normal list. It's also very easy to implement.

But is the move-to-front heuristic really useful? We must somehow measure the quality of this heuristic. To do so, we'll compare the expected search time in a MTF list with that in an optimal list of the same elements.

Note that we can almost never construct the optimal list, since the probability distribution of queries is almost always unknown. So we're hoping that the MTF heuristic gives an expected search time close to that of the optimal list.

Given elements, name them by their position in the optimal list:

Then (from the optimal list theorem) .

Let be the expected search time in the optimal list of these elements.

Let be the expected search time in the MTF list of the same elements.

Clearly, since the optimal list has minimum expected search time, of all possible lists.

Theorem

Discussion

The theorem says that the expected search time of the MTF list is at most twice optimal. This is a surprising result, since we don't even know the probability distribution!

Consider, for example, these elements and their associated probabilities:

x A B C D

P( x ) 0.1 0.1 0.8 0.0

The theorem states that .

Since we don't know , we might have built an arbitrary, static list, like

Had we done so, we would have been stuck forever with an expected search time of . Clearly, the MTF list is better, even though it's not optimal. When we don't know the probability distribution, it's best to use an MTF list rather than risk choosing a bad ordering in a static list.

Proof of Theorem

In the MTF list of these elements, let's perform enough searches so that the ordering of the MTF list is unaffected by its initial ordering. Each element will have been sought at least once.

We'll take a ``snapshot'' of the list at this point and will analyse the expected search time.

Let be the position of in this snapshot. The expected value of is (from our definition of expectation)

For example, suppose we have

percentage of time

1 50 %

2 10 %

3 20 %

4 20 %

Then

and is, on average, in position 2.1.

If we knew , we would know , since .

So what is ?

In the summation above, if precedes , it adds one to the number of preceding elements.

So what is ?

For example, suppose and .

In 30% of the searches, moves to the front.
In 10%, moves to the front.
In 60%, something else moves to the front. This 60% has no effect on the relative ordering of and .