CSC260 HOMEWORK 4 SOLUTIONS QUESTION 1 - PART A (7 marks) ------------------- The data 1: P( asymptotic ) = 0.19 2: P( engrossed ) = 0.12 3: P( independent ) = 0.08 4: P( irrelevant ) = 0.10 5: P( its ) = 0.04 6: P( misspell ) = 0.02 7: P( necessary ) = 0.03 8: P( receive ) = 0.11 9: P( sentence ) = 0.18 10: P( slaughtered ) = 0.13 The C array (rows = i, columns = j) 1 2 3 4 5 6 7 8 9 10 -------------------------------------------------------------------------------- 1 | 0.19 0.43 0.66 0.94 1.06 1.14 1.28 1.70 2.25 2.66 2 | 0.12 0.28 0.52 0.64 0.72 0.83 1.14 1.68 2.08 3 | 0.08 0.26 0.34 0.40 0.51 0.82 1.25 1.64 4 | 0.10 0.18 0.24 0.35 0.63 1.01 1.37 5 | 0.04 0.08 0.16 0.36 0.72 1.00 6 | 0.02 0.07 0.23 0.57 0.83 7 | 0.03 0.17 0.49 0.75 8 | 0.11 0.40 0.66 9 | 0.18 0.44 10 | 0.13 The R array 1 2 3 4 5 6 7 8 9 10 -------------------------------------------------------------------------------- 1 | 1 1 2 2 2 2 2 2 4 4 2 | 2 2 3 3 3 4 4 4 8 3 | 3 4 4 4 4 4 8 8 4 | 4 4 4 4 5 8 9 5 | 5 5 5 8 8 9 6 | 6 7 8 9 9 7 | 7 8 9 9 8 | 8 9 9 9 | 9 9 10 | 10 The tree .----- 4 --------------------------. / \ . 2 . .-- 9 --. / \ / \ 1 3 .------------ 8 10 / 5 -----. \ . 7 / 6 QUESTION 1 - PART B (3 marks) ------------------- To detect whether a word is misspelled, we would search the tree of correctly spelled words and report a misspelling if the word is not found. We must know where in the tree misspelled words are detected. Some misspelled words occur before x1 (the alphabetically first word in the tree), others occur between x1 and x2, and so on. For each interval [ ... x1 ] [x1,x2] [x2,x3] ... [x{N-1},xN] [xN ... ] we must know the probability that a misspelled word will be in the interval. Misspelled words will not be evenly distributed between the x{i}, so our tree must take this into account. Given these intervals and probabilities, we create "dummy nodes" corresponding to the intervals and assigns them the appropriate probabilities. We build the optimal BST as usual, including the N nodes for correctly spelled words and the N+1 nodes for intervals where misspelled words might appear. However, the dummy nodes must appear as *leaves* since each node in a BST stores a key, and a dummy nodes doesn't correspond to any particular key. Thus, we must be sure in building the BST that a dummy node is *never* considered as the root of a subtree. This is easy to accomplish with an `if' statement in the innermost loop of the standard algorithm. Dummy nodes are removed from the tree once the optimal structure of the BST is computed. QUESTION 2 (10 marks) ---------- The standard algorithm considers adding a single block to the end of a knapsack of size i by comparing cost[i] with cost[i-size(block)] + value(block). If the second expression is larger, the block is added by setting cost[i] = cost[i-size(block)] + value(block) and by setting last[i] = block. Since the startup cost is only applied to the first block of time allocated to each course, it is usualy more efficient to allocate many blocks of time to each course. The standard algorithm must be modified to consider adding multiple blocks of the same course. In addition to making the comparison described above, the modified algorithm must check the expressions cost(i) cost(i - 1*size(block)) + value(1*block) cost(i - 2*size(block)) + value(2*block) cost(i - 3*size(block)) + value(3*block) and so on, as long as i - k*size(block) >= 0. The maximum of the expressions determines what is added. To store the number of blocks added, we use an array `num[i]', where num[i] is the number of blocks added to fill the knapsack of size i. For example, if the maximum expression above has 3 blocks then we set num[i] = 3. Here's what we get: Course Block Inc/hr Startup csc270 1 3 0 csc258 3 7 15 csc238 4 9 30 csc228 2 5 5 csc209 1 4 2 After adding csc270: hours: 1 2 3 4 5 6 7 8 9 10 11 cost: 3 6 9 12 15 18 21 24 27 30 33 last: csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 csc270 num: 1 1 1 1 1 1 1 1 1 1 1 After adding csc258: hours: 1 2 3 4 5 6 7 8 9 10 11 cost: 3 6 9 12 15 27 30 33 48 51 54 last: csc270 csc270 csc270 csc270 csc270 csc258 csc258 csc258 csc258 csc258 csc258 num: 1 1 1 1 1 2 2 2 3 3 3 After adding csc238: hours: 1 2 3 4 5 6 7 8 9 10 11 cost: 3 6 9 12 15 27 30 42 48 51 54 last: csc270 csc270 csc270 csc270 csc270 csc258 csc258 csc238 csc258 csc258 csc258 num: 1 1 1 1 1 2 2 2 3 3 3 After adding csc228: hours: 1 2 3 4 5 6 7 8 9 10 11 cost: 3 6 9 15 18 27 30 42 48 51 54 last: csc270 csc270 csc270 csc228 csc228 csc258 csc258 csc238 csc258 csc258 csc258 num: 1 1 1 2 2 2 2 2 3 3 3 After adding csc209: hours: 1 2 3 4 5 6 7 8 9 10 11 cost: 3 6 10 15 18 27 30 42 48 51 54 last: csc270 csc270 csc209 csc228 csc228 csc258 csc258 csc238 csc258 csc258 csc258 num: 1 1 3 2 2 2 2 2 3 3 3 Alternatives: cost[ 2] = 6 with 2 blocks of csc209 cost[ 5] = 18 with 5 blocks of csc209 cost[11] = 54 with 2 blocks of csc209 From the final table above (along with the alternatives listed), the TWO ways to get an increase of 54 marks are: 1. 3 blocks (= 9 hours) csc258 2 blocks (= 2 hours) csc270 2. 3 blocks (= 9 hours) csc258 2 blocks (= 2 hours) csc209 Note that if we start with the last of the three alternatives above (using csc209), we get 2 blocks (= 2 hours) csc209 3 blocks (= 9 hours) csc258 which is the *same* as #2 above, just in a different order.