Back to index

CSC 378F Data Structures and Algorithm Analysis

SKIP LISTS 
A skip list is a sorted, linked list of n nodes.

Each node can have many pointers:

standard linked-list node ( level 0 )

level 1 node

level 2 node

A level-k node has k + 1 pointers, each corresponding to a level:
Skip List Property
A level-k pointer in a node with data d points to the following node of level sL_1.gif k, with data sL_2.gif d.
e.g.
Header Node
A "header" node H has key sL_3.gif and points to the first node of each level.
The variable "MaxLevel" stores the maximum level in a list.
Perfect Skip Lists
In a perfect skip list, a level-k node points to the sL_4.gif node past it.
 
Level
0
points to the
1st
node
"
1
"
2nd
"
"
2
"
4th
"
"
3
"
8th
"
 
Like so :
Searching in a Skip List
A skip list need not necessarily be perfect.
Node x has
key( x )
next( x, L)(the level - L pointer from x)


Idea
Search on the highest level as far as possible, then "drop down" and search on next level.
 

e.g. Search for 40
This results in the following code :

So in the previous picture -

 
Iteration
0
1
2
3
4
5
6
x at start
H
22
22
22
29
29
40
 
Running Time
With each iteration, either -
[A] x moves forward in its current level
or [B] L moves down a level

In a perfect skip list -
MaxLevel sL_5.gif , so the total cost of all [B]'s is sL_6.gif O( log n ).
Moving forward in a level skips over half the nodes remaining to the right of x, so [A] occurs only once per level. Total cost of all [A]'s is thus sL_7.gif O( log n )
Insertion in a Skip List
Insertion at the front of a perfect skip list requires "shifting" all the data (but not the nodes) one node to the right: cost sL_8.gif O( n )!

Solution
Don't use perfect skip lists. When inserting , choose the new node's level randomly !

Based on the observation that (in a perfect list) sL_9.gif of all nodes are level-0, sL_10.gif are level-1, etc :

 
level
probability
0
sL_11.gif 
1
sL_12.gif 
2
sL_13.gif 
...
...
k
sL_14.gif 
 
Generating a Random Level
Idea
- Flip a coin until it lands 'heads'.
- The level obtained is the number of flips - 1.

Insertion
Idea
- Traverse the list as we did in 'Search'.
- Every time we go down a level,remember where the pointer was.
- Update those pointers when the new node is created.

from [Weiss], inserting 22 :

*d_sL09*
Remember the pointers :
Nodes[ 2 ] = node 13
Nodes[ 1 ] = node 20
Nodes[ 0 ] = node 20

where Nodes[ i ] is the node at which x was when the level went from i to i-1.




In practice, this looks something like this :

e.g.

Now let's go back and trace the example from Weiss above :

With NewLevel = 2, we find -
Iteration
x
L
action
1
H
3
advance x
2
13
3
decr L
3
13
2
Nodes[ 2 ] sL_15.gif 13, decr L
4
13
1
advance x
5
20
1
Nodes[ 1 ] sL_16.gif 20, decr L
6
20
0
Nodes[ 0 ] sL_17.gif 20, decr L
7
20
-1
exit loop

To create n :
Next( n, 2 ) sL_18.gif NULL; Next( 13, 2 ) sL_19.gif n
Next( n, 1 ) sL_20.gif 29; Next( 20, 1 ) sL_21.gif n
Next( n, 0 ) sL_22.gif 23; Next( 20, 0 ) sL_23.gif n


Skip List Analysis[ L&D 190-192 ]
Insert and Delete take time proportional to Search : they search, then update at most MaxLevel pointers - so we'll just analyze Search( ).

Lemma 1
Let Level( x ) be the level of node x.

P( Level( x ) sL_24.gif i ) = sL_25.gif

Intuition :

Lemma 2
In a list of n nodes, E( MaxLevel ) sL_26.gif

Intuition (no proof ) :
In a list of n nodes, the number of nodes of each level is expected to be

 
Level
E( # of nodes of that level )
0
sL_27.gif n
1
sL_28.gif n
2
sL_29.gif n
k
sL_30.gif 
log n -1
sL_31.gif = 1
log n
sL_32.gif 
log n +1
sL_33.gif 
 
Theorem
In a skip list of n nodes, the expected time of Search, Insert, and Delete is O( log n ).

Let's follow the header-to-element path backward from the element.

e.g.Search( 45 )

 
Node
Level
Action
45
0
back
44
0
up
44
1
back
29
1
back
22
1
up
22
2
up
22
3
back
H
3
 
Expected search time is equal to the expected number of 'back' and 'up' steps.

Observation
In following the path backward, at node x and level L, we -
- go back if level( x ) = L
- go up if level( x ) > L

sL_34.gif

 sL_35.gif

 sL_36.gif

 sL_37.gif

P( back ) = sL_38.gif andP( up ) = sL_39.gif

Let C( k ) be the expected length of a path that rises k levels.
(e.g. in the previous diagram, we rose 3 levels = MaxLevel - 0)

Then
C( k ) = P( back ) * ( 1 step back + C( k ) )
+ P( up ) * ( 1 step up + C( k-1 ) )

sL_40.gif ( 1 + C( k ) ) + sL_41.gif ( 1 + C( k-1 ) )

= 1 + sL_42.gif C( k ) + sL_43.gif C( k-1 )

= 2 + C( k-1 )

= 2 ksince C( 0 ) = 0

A path rising k levels has expected length 2 k.

Let's follow the path back until we reach level sL_44.gif (expected MaxLevel).

sL_45.gif expected cost = sL_46.gif
- but we might not be at the header yet. We must follow the pointers back to the header, going through nodes of level sL_47.gif .
sL_48.gif expected cost sL_49.gif expected number of nodes
of level sL_50.gif

sL_51.gif

sL_52.gif

sL_53.gif expected cost sL_54.gif 2 log n + 2 sL_55.gif

which sL_56.gif O( log n ), thus -

Skip list operations take expected time O( log n )!
The key to the proof is in counting the path length backward.

Summary of Randomized Algorithms
 
BSP - randomized insertion order 
- expected tree size O( n log n )
Universal Hashing - randomized hash function 
- probability of collision is sL_57.gif 
Skip List - randomized level of new node 
- expected running time O( log n )

In general :
No upper bounds, so we could have a bad 'worst case' performance in some (infrequent) cases.

The randomization makes the algorithm immune to 'bad input' which, without randomization, would cause poor performance.

In the examples above, there is no input guaranteed to cause bad performance.

Randomized algorithms are often quite simple to implement.

'Quicksort' is another example.

Back to index


Copyright 1998
James Stewart
Dynamic Graphics Project
Department of Computer Science
University of Toronto