CSC260 HOMEWORK 4 SOLUTIONS


QUESTION 1 - PART A  (7 marks)
-------------------

The data

 1: P(  asymptotic ) = 0.19
 2: P(   engrossed ) = 0.12
 3: P( independent ) = 0.08
 4: P(  irrelevant ) = 0.10
 5: P(         its ) = 0.04
 6: P(    misspell ) = 0.02
 7: P(   necessary ) = 0.03
 8: P(     receive ) = 0.11
 9: P(    sentence ) = 0.18
10: P( slaughtered ) = 0.13


The C array (rows = i, columns = j)

          1       2       3       4       5       6       7       8       9      10  
   --------------------------------------------------------------------------------
 1 |   0.19    0.43    0.66    0.94    1.06    1.14    1.28    1.70    2.25    2.66  
 2 |           0.12    0.28    0.52    0.64    0.72    0.83    1.14    1.68    2.08  
 3 |                   0.08    0.26    0.34    0.40    0.51    0.82    1.25    1.64  
 4 |                           0.10    0.18    0.24    0.35    0.63    1.01    1.37  
 5 |                                   0.04    0.08    0.16    0.36    0.72    1.00  
 6 |                                           0.02    0.07    0.23    0.57    0.83  
 7 |                                                   0.03    0.17    0.49    0.75  
 8 |                                                           0.11    0.40    0.66  
 9 |                                                                   0.18    0.44  
10 |                                                                           0.13  

The R array

          1       2       3       4       5       6       7       8       9      10  
   --------------------------------------------------------------------------------
 1 |      1       1       2       2       2       2       2       2       4       4  
 2 |              2       2       3       3       3       4       4       4       8  
 3 |                      3       4       4       4       4       4       8       8  
 4 |                              4       4       4       4       5       8       9  
 5 |                                      5       5       5       8       8       9  
 6 |                                              6       7       8       9       9  
 7 |                                                      7       8       9       9  
 8 |                                                              8       9       9  
 9 |                                                                      9       9  
10 |                                                                             10  

The tree

         .----- 4 --------------------------.
        /                                    \
     . 2 .                                .-- 9 --.
    /     \                              /         \
   1       3              .------------ 8          10
                         /
                        5 -----.  
                                \
                               . 7
                              /
                             6

QUESTION 1 - PART B  (3 marks)
-------------------

To detect whether a word is misspelled, we would search the tree of
correctly spelled words and report a misspelling if the word is not
found.  

We must know where in the tree misspelled words are detected.  Some
misspelled words occur before x1 (the alphabetically first word in the
tree), others occur between x1 and x2, and so on.  For each interval

  [ ... x1 ] [x1,x2] [x2,x3] ... [x{N-1},xN] [xN ... ]

we must know the probability that a misspelled word will be in the
interval.  Misspelled words will not be evenly distributed between the
x{i}, so our tree must take this into account.

Given these intervals and probabilities, we create "dummy nodes"
corresponding to the intervals and assigns them the appropriate
probabilities.  We build the optimal BST as usual, including the N
nodes for correctly spelled words and the N+1 nodes for intervals
where misspelled words might appear.  However, the dummy nodes must
appear as *leaves* since each node in a BST stores a key, and a dummy
nodes doesn't correspond to any particular key.  Thus, we must be sure
in building the BST that a dummy node is *never* considered as the
root of a subtree.  This is easy to accomplish with an `if' statement
in the innermost loop of the standard algorithm.  Dummy nodes are
removed from the tree once the optimal structure of the BST is
computed.


QUESTION 2  (10 marks)
----------

The standard algorithm considers adding a single block to the end of a
knapsack of size i by comparing cost[i] with cost[i-size(block)] +
value(block).  If the second expression is larger, the block is added
by setting cost[i] = cost[i-size(block)] + value(block) and by setting
last[i] = block.

Since the startup cost is only applied to the first block of time
allocated to each course, it is usualy more efficient to allocate many
blocks of time to each course.  The standard algorithm must be
modified to consider adding multiple blocks of the same course.  In
addition to making the comparison described above, the modified
algorithm must check the expressions

  cost(i)
  cost(i - 1*size(block)) + value(1*block)
  cost(i - 2*size(block)) + value(2*block)
  cost(i - 3*size(block)) + value(3*block)

and so on, as long as i - k*size(block) >= 0.  The maximum of the
expressions determines what is added.  

To store the number of blocks added, we use an array `num[i]', where
num[i] is the number of blocks added to fill the knapsack of size i.
For example, if the maximum expression above has 3 blocks then we set
num[i] = 3.

Here's what we get:

  Course   Block   Inc/hr   Startup
  csc270     1       3         0
  csc258     3       7        15
  csc238     4       9        30
  csc228     2       5         5
  csc209     1       4         2

After adding csc270:

hours:    1      2      3      4      5      6      7      8      9     10     11   
cost:     3      6      9     12     15     18     21     24     27     30     33   
last:  csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 
num:      1      1      1      1      1      1      1      1      1      1      1   

After adding csc258:

hours:    1      2      3      4      5      6      7      8      9     10     11   
cost:     3      6      9     12     15     27     30     33     48     51     54   
last:  csc270 csc270 csc270 csc270 csc270 csc258 csc258 csc258 csc258 csc258 csc258 
num:      1      1      1      1      1      2      2      2      3      3      3   

After adding csc238:

hours:    1      2      3      4      5      6      7      8      9     10     11   
cost:     3      6      9     12     15     27     30     42     48     51     54   
last:  csc270 csc270 csc270 csc270 csc270 csc258 csc258 csc238 csc258 csc258 csc258 
num:      1      1      1      1      1      2      2      2      3      3      3   

After adding csc228:

hours:    1      2      3      4      5      6      7      8      9     10     11   
cost:     3      6      9     15     18     27     30     42     48     51     54   
last:  csc270 csc270 csc270 csc228 csc228 csc258 csc258 csc238 csc258 csc258 csc258 
num:      1      1      1      2      2      2      2      2      3      3      3   

After adding csc209:

hours:    1      2      3      4      5      6      7      8      9     10     11   
cost:     3      6     10     15     18     27     30     42     48     51     54   
last:  csc270 csc270 csc209 csc228 csc228 csc258 csc258 csc238 csc258 csc258 csc258 
num:      1      1      3      2      2      2      2      2      3      3      3   

Alternatives:

  cost[ 2] =  6 with 2 blocks of csc209
  cost[ 5] = 18 with 5 blocks of csc209
  cost[11] = 54 with 2 blocks of csc209


From the final table above (along with the alternatives listed), the
TWO ways to get an increase of 54 marks are:

1. 3 blocks (= 9 hours) csc258
   2 blocks (= 2 hours) csc270

2. 3 blocks (= 9 hours) csc258
   2 blocks (= 2 hours) csc209

Note that if we start with the last of the three alternatives above
(using csc209), we get

   2 blocks (= 2 hours) csc209
   3 blocks (= 9 hours) csc258

which is the *same* as #2 above, just in a different order.