CSC270 October 28 Tutorial: C pointers and Makefiles


[ King 11, 12, 15.4, 17 ]




C pointers
[ King 11.1, 11.2, 11.3 ]

Each byte of memory has a different address.

Each variable is stored in some number of contiguous bytes of memory.
The address of a variable is the address of the first byte in which
it's stored.

    0100
    0101 xxx      <-- An integer might take 4 bytes.  This integer's
    0102 xxx          address is 0101
    0103 xxx
    0104 xxx
    0105
    0106
    0107 ccc      <-- A character only takes on byte.  This character's
    0108              address is 0107

In C, a pointer is a variable that stores an address.  If a
pointer `p' contains the address of a variable `x', we say p points
to x.  This is typically drawn with an arrow from p to x:

     p           x
   +---+       +---+
   | O |   --> | 5 |
   +-+-+  /    +---+
     |    |      
     +----+

Each pointer can point to many different variables at different times
during the program execution, but each variable must be of the same
type.

Declarations

The following declares integers `a' and `b' and a "pointer to integer"
`p'.  The asterisk before `p' shows that it is a pointer.  There can
be a space between the asterisk and `p' if you wish.

    int a, b;
    int *p;

``Address of''

To get `p' pointing to `a', we assign `p' the address of `a':

    p = &a;

The ampersand, when appearing in front of a variable, gives that
variable's address.  The above statement is read `p is assigned the
address of a'.

Dereferencing

To assign a value to the variable pointed to by `p', we use the
asterisk.

    *p = 55;
    printf( "%d %d\n", a, *p );

    output ---> 55 55

The above statement is read `the integer pointed to by p is assigned
55'.  This operation ``dereferences'' p.  In other words, p is a
``reference'' to something and *p is the actual something.

To get the value, we also use the asterisk:

    b = *p;
    printf( "%d\n", b );
 
    output ---> 55

Layout in memory

Suppose things are laid out in memory as follows:
    
    0100 aaa      < -- integer a
    0101 aaa
    0102 aaa
    0103 aaa
    0104 bbb      < -- integer b
    0105 bbb
    0106 bbb
    0107 bbb
    0108
    0109
    010A 
    010B
    010C ppp      < -- pointer p
    010D ppp
    010E ppp
    010F ppp

Then after the following statements, memory would look as shown below
(the hyphens are placeholder for the remainder of the variable).

    a = 55;
    p = &a;
    b = *p + 22;

    0100  55      < -- integer a
    0101  - 
    0102  - 
    0103  - 
    0104  77      < -- integer b
    0105  - 
    0106  - 
    0107  - 
    0108
    0109
    010A 
    010B
    010C 0100     < -- pointer p
    010D  - 
    010E  - 
    010F  -

Pointers as Arguments
[ King 11.4 ]

C functions use "call-by-value".  The arguments of the function call
are evaluated, their values are passed in to the function as
parameters, and any modifications to the parameters in not returned
from the function call.

To modify a variable that is one of the function arguments, you must
pass a POINTER to that variable:

  void f( int *p )

  {
    *p = *p + 1;
  }

  main()

  {
    int i, j;

    i = 0;
    j = 1;

    f( &i );
    f( &j );

    printf( "i = %d, j = %d\n", i, j );      --> i = 1, j = 2
  }

Above, f takes a pointer to an integer.  It increments the integer
pointed to by `p'.  The call to `f' must pass in a pointer to an
integer: &i is the address of `i' (in other words, a pointer to `i').

Pointers and structs
[ King 17.3, 17.4, 17.5 ]

Pointers are often used with structures in C.  For example, a
linked-list node looks like

    struct ll_node {
      int data;
      struct ll_node *next;
    }

Such a node contains data and a pointer `next' to the next node on the
list.  Here, `struct ll_node' is the data type.  Let's define an
LL_NODE type:

    typedef struct ll_node {
      int data;
      struct ll_node *next;
    } LL_NODE;

Note that it would not work to use "LL_NODE *" inside the structure
definition.  Typically, pointers to structures use the "struct ll_node
*" form.

A linked list has a pointer to its first node, which is initialized to
NULL.  `NULL' is usually defined in stdlib.h.

    LL_NODE *head;

    head = NULL;

A new node is created with malloc.

    LL_NODE *p;

    p = (LL_NODE *) malloc( sizeof( LL_NODE ) );

The fields of the structure would normally be referenced with the
`dot' operator, as in struct.data and struct.next.  BUT ... since p is
a POINTER to the structure, we use another operator, the `arrow'.  The
following initializes the new structure and adds it to the head of the
list.

    p->data = 0;
    p->next = NULL;

    head = p;

The following creates a list of nodes storing 5,4,3,2,1,0 in that
order.  Nodes are successively added to the *head* of the list as the
index i increases.

    LL_NODE *head, *p;
    int i;

    head = NULL;

    for (i=0; i < 6; i++) {
      p = (LL_NODE *) malloc( sizeof( LL_NODE ) );
      p->data = i;
      p->next = head;
      head = p;
    }

Trace it.

When you no longer need something that you allocated with `malloc',
BE SURE to return the memory to the operating system:

   free( p );

Above, p is a pointer that was returned from some call to malloc.

Pointer Notation Tricks
If `s' is a structure and `p' is a pointer to it, there are several
ways to reference the fields of the structure.  The following
references to `data' are all equivalent.

   LL_NODE s, *p;

   s.data = 5;
   p->data = 5;
   (*p).data = 5;

The last is interesting.  (*p) is the thing pointed to by `p'.  Since
this thing IS a structure, we use the dot notation to reference a
field.

Pointers and Arrays
[ King 12.1, 12.2, 12.3 ]

In C, an array variable is always a pointer to the first element of
the array!  

This means that you can pass array into functions without incurring
the cost of copying the whole array.  For example, suppose f() takes
an array of integers as an argument.  The following works because
arrays are represented with pointers.

    void f( int *a, int size )  

    {
      int i;

      for (i=0; i < size; i++)
        printf( "a[%d] = %d\n", i, a[i] );
    }

    main() 

    {
      int x[10];

      f( x, 10 );
    }

In passing `x' in to the function, we're really passing a pointer to the
first element of the array.  We could just as well have done

    f( &(x[0]), 10 );

since this also passes in the address of the first element.  Or, we could
have passed in only the middle four elements of the array:

    f( &(x[3]), 4 );

This passes in a pointer to element x[3] and tells f() that there are
four elements in the array that starts at that address:

               &(x[3])
                  |
                  v
    +---+---+---+---+---+---+---+---+---+---+
  x |   |   |   |   |   |   |   |   |   |   |
    +---+---+---+---+---+---+---+---+---+---+
      0   1   2   3   4   5   6   7   8   9

f() thinks it has the following array:

                +---+---+---+---+
             a  |   |   |   |   |
                +---+---+---+---+
                  0   1   2   3  

Dynamic Array Allocation
[ King 17.3 ]

Since arrays are represented with pointers, we can allocate them
dynamically.  The following allocates an array of 100 integers and
sets all entries to zero.

    int *a, i;

    a = (int *) malloc( 100 * sizeof( int ) );

    for (i=0; i < 100; i++)
      a[i] = 0;

We could just as well have used a pointer to move through the array:

    int *a, *p;

    a = (int *) malloc( 100 * sizeof( int ) );

    p = a;
    for (i=99; i>=; i--) {
      *p = 0;
      p++;
    }

The statement `p++' means `increment p'.  When applied to pointers,
this means `increment the address in p by the size of the thing p
points to'.  In other words (in this case), `point p to the next
integer following it in memory'.

Multidimensional Arrays
Pointers and multidimensional arrays are slightly more involved.  See
[ King 12.4 ], although that doesn't go into very much detail.



Makefiles
[ King 15.4 ]

`make' is a Unix program that helps in compiling large programs that
occupy more than one source file.  Suppose you had a program that
occupied three source files:

  maze.c
  read.c
  shortest-matrix.c

Typically, what you do is compile each file separately and then link
them together:

  % gcc -c maze.c
  % gcc -c read.c
  % gcc -c shortest-matrix.c
  % gcc -o mazem maze.o read.o shortest-matrix.o

The -c flag causes gcc to stop compiling after producing an object
file with the .o suffix.  Thus, the first three lines above produce
the files

  maze.o
  read.o
  shortest-matrix.o

The last line above links the three *.o files into an executable
called `mazem'.

It would be painful if you had to do this by hand every time you made
a change in your source code.  In fact, you might forget which files
you made changes to and then forget to recompile, resulting in a
debugging nightmare!

`make' deals with this.  You must create a `Makefile' that defines
the dependences between your files.  For example,

  mazem               depends upon   maze.o, read.o, and shortest-matrix.o
  maze.o              depends upon   maze.c
  read.o              depends upon   read.c
  shortest-matrix.o   depends upon   shortest-matrix.c

The corresponding Makefile would contain

  mazem:  maze.o read.o shortest-matrix.o
	  gcc -o mazem maze.o read.o shortest-matrix.o

  maze.o: maze.c
          gcc -c maze.c

  read.o: read.c
          gcc -c read.c

  shortest-matrix.o:   shortest-matrix.c
                       gcc -c shortest-matrix.c

The lines of the form

  FILE:	  FILE1 FILE2 FILE3 ...

define dependences, where FILE depends upon FILE1 FILE2 FILE3 ...  If
you change any one of the file to the right of the colon, `make' will
recreate the file to the left of the colon.

WARNING: There must be at least one TAB after the colon.  Otherwise,
`make' will not work.

The line below the dependency line gives a Unix command to recreate
FILE from FILE1, FILE2, FILE3, ...  There can be more than one line
if necessary.

WARNING: These line must also start with TABs.

NOTE: If a *.o file depends only upon a *.c file of the same name, no
dependency line is necessary; `make' knows what to do.  Thus, the
Makefile above could be shortened to:

  mazem:  maze.o read.o shortest-matrix.o
	  gcc -o mazem maze.o read.o shortest-matrix.o



Including compilation flags

There are several variables that can be used in the Makefile.  The
most important is CC, which is the string used by `make' to run the C
compiler.  If you want to include flags with every C compilation,
include the following at the top of your Makefile:

  CC = gcc -g -Wall

Then `make' will use that string every time it does a C compilation.
However, you must then use $(CC) in your makefile everywhere that
you do a compilation.  The Makefile would change to 

  CC = gcc -g -Wall

  mazem:  maze.o read.o shortest-matrix.o
	  $(CC) -o mazem maze.o read.o shortest-matrix.o



Including header files *.h

If your source files depend upon other *.h files that contain
definitions, it's often a good idea to state this in the Makefile.
For example, if read.c has a line of
the form

  #include "defs.h"

then this is reflected in the Makefile as

  read.o:  read.c defs.h

Thus, `make' will recompile read.o if there is a change to read.c OR
to defs.h.  As before, no compilation statement is necessary since
`make' knows how to create read.o from read.c.