Sample RISC machine language (assembly language) problems

Here are some problems and solutions for a sample hypothetical RISC-like machine language.

First, a simple set of instructions to compute C := A + B. There's really nothing very interesting to do with the delayed load slots, so we don't, except for the last.

     LOAD A, R1
     NOP  ; can't do a load in the delayed load slot, either...
     LOAD B, R2
     NOP
     ADD R1, R2, R2
     NOP  ; and you can't do STORE R2 immediately after ADD ,,R2
     STORE R2, C
     HALT  ; ha ha!  Store will complete while the CPU is halted!

Now, some more-interesting code: the Ackermann function (with as much overcommenting as in the published assignment three solutions):

First, the C/Java code we're translating:

int ack(int m, int n)   // precondition: m and n are non-negative
{
    if (m == 0)
        return n + 1;
    else if (n == 0)
        return ack(m - 1, 1);
    else
        return ack(m - 1, ack(m, n - 1));
}

Or in Python:

def ack(m, n):
    if m == 0:
        return n + 1
    elif n == 0:
        return ack(m - 1, 1)
    else:
        return ack(m - 1, ack(m, n - 1))

Our RISCy version (remember that the parameters are in R1 and R2) (lots of delayed branch slots! also after JSR and RTS! sometimes random stuff goes there which is noted as "harmless"... else NOP):

ACK:      ADD R0, R1, R1 ; test m
          BNE ELSE1
	  NOP  ; delayed branch slot
          ADD R0, R2, R1 ; n -> R1
          RTS            ; that is, if (m == 0) return n + 1
	  ; delayed branch slot:
          ADDI 1, R1     ; might overflow, in which case we return with V set

ELSE1:    ADD R0, R2, R2 ; test n
          BNE ELSE2
          ADD R0, R1, R8 ; delayed branch slot: R8 <- m in any case
          ADDI -1, R8    ; first parameter is m - 1
          JSR ACK        ; ack(m - 1, 1)
          MOVEI 1, R9    ; delayed branch slot: second parameter is 1
	  BVS OVERFLOW
	  ADD R0, R8, R1 ; move result into R1 (harmless if V set)
          RTS            ; that is, else if (n == 0) return ack(m - 1, 1)
	  ; warning: delayed branch slot follows!

ELSE2:    ; the delayed branch slot above set m as first parameter
          ADD R0, R2, R9 ; put n in R9; harmless for RTS
          JSR ACK
          ADDI -1, R9    ; delayed branch slot: second parameter is n - 1
          BVS OVERFLOW
          ADD R0, R8, R9 ; return value is second parameter to the next call
          ADD R0, R1, R8 ; m
          JSR ACK
          ADDI -1, R8    ; delayed branch slot: first parameter is m - 1
	  BVS OVERFLOW
	  NOP  ; delayed branch slot -- can't find anything to put here
          RTS            ; that is, else return ack(m - 1, ack(m, n - 1))
	  ADD R0, R8, R1 ; delayed branch slot: move result in R1

OVERFLOW: RTS
          SEV  ; delayed branch slot -- set 'V' condition code

Example #2:
Prime-testing subroutine:

int isprime(int p)
{
    int d;

    for (d = 2; d < p; d++)
        if (p % d == 0)
            return 0;

    return 1;
}

Or in Python:

def isprime(p):
    for d in range(2, p):
	if p % d == 0:
	    return 0
    return 1

In RISC code:

ISPRIME: MOVEI 2, R2  ; the number we're trying to divide into p (== R1)
   LOOP: SUB R1, R2, R0
         BLE YEP      ; when we want to return 1

         ; now calculate p%d
         DIV R1, R2, R4  ; R5 <- [R1] % [R2]; harmless if BLE YEP

	 ; if (p % d == 0) return 0
	 ADD R0, R5, R5  ; test R5
	 BNE LOOP
         ADDI 1, R2  ; delayed branch slot: this is the loop increment

	 RTS
	 ADD R0, R0, R1  ; delayed branch slot: return 0

    YEP: ; return 1
	 RTS
	 MOVEI 1, R1  ; delayed branch slot

Program code to sum the numbers in memory locations 100 through 110, inclusive, illustrating delayed load rule issues:

      MOVEI 100, R2  ; pointer
      MOVEI 110, R3  ; target
      ADD R0, R0, R1 ; sum
LOOP: ADD R0, R2, R4
      IND R0, R4, R4 ; load indirect R4, into R4 -- i.e. R4 <- [[R4]]
      ADDI 1, R2     ; can't access R4 in this cycle
      SUB R2, R3, R0 ; if (p <= 110)
      BLE LOOP
      ADD R4, R1, R1 ; delayed branch slot (happens anyway):  sum += *p
      HALT

A problem from the VELMA problems page, but simplified so as not to care about overflow (we already showed dealing with that in the Ackermann function):

Write a subroutine to compute the sum of an array of words. The calling sequence to add 20 words starting at location 3000 is:

MOVEI 3000, R8
MOVEI 20, R9
JSR ROUTINE

An answer:

ROUTINE: ADD R0, R1, R3  ; get pointer 'p' out of the way of the sum
         ADD R0, R0, R1  ; initialize sum to 0
   LOOP: ADD R0, R3, R4
	 IND R0, R4, R4  ; load indirect R4, into R4
	 ADDI 1, R3      ; p+=1 (can't access R4 in this cycle)
	 ADDI -1, R2     ; count
	 BGT LOOP
	 ADD R4, R1, R1  ; delayed branch slot: sum += *p
	 RTS
	 NOP


[list of course notes topics available] [main course page]