Example code:
Above all, Keep It Simple!
And, I would like to bring back to your attention the "various segfault problems" entry in the assignment three Q&A file, which gives some general advice.
See the assignment handout for documentation of the protocol.
Just about the easiest way to write a fixed message to a socket is like this:
static char msg[] = "Hello, world\r\n"; write(fd, msg, sizeof msg - 1);(That might look like an "array reference" declaration like in java, but actually what that means in C is that the compiler is to determine the array size from the initializer.)
"sizeof msg" includes the terminating zero byte, which we do not want to write. Similarly, if you make up a string to write to a socket, the number of bytes to write is strlen(buf), not strlen(buf)+1.
(Note that sizeof is only of use here because there is a relation between the size of the data object and the length of the string. You can't use sizeof if you declare an array and then fill it up (it will give you the size of the array, not of the portion of it which is a zero-terminated string); and you can't use sizeof on a string passed as a parameter to a function (what you pass is a pointer to the zeroth element of the string, and sizeof that is the storage requirements of the pointer, not of the array it points to the zeroth element of).)
Q: What does sizeof really do?
A: It tells you the size of an object or type, in bytes. It's only useful in very specific circumstances. It does not tell you the number of bytes you need to transmit over the network, in hardly any circumstances.
If you have
char buf[300];then "sizeof buf" is 300, which might be useful, e.g. in indicating the amount of buffer space available to a read() or fgets(). But it is not useful in a write(), because you don't want to transmit 300 bytes. And if you pass this to a function, it decays into a pointer to its zeroth element, and the sizeof of that is something like 4; this is probably not of any use to you.
If you want to transmit "hello\r\n", for example, that's 7 bytes.
So this is also wrong:
char msg[] = "hello\r\n"; write(fd, msg, sizeof msg); WRONG WRONG WRONGbecause sizeof msg is 8, including the terminating zero byte, but you want to transmit 7 bytes.
So just figure out how many bytes you want to transmit. This is not always trivial.
read() returns a byte count. You need to store that. read() does not produce a zero-terminated string. You can assign the '\0' to a member of the array in a separate C statement if you want it there. This is done in some of the supplied code for this assignment.
Q: What is the first argument to select()?
A: The max of all of the file descriptor numbers in all of the fd sets, plus one. See chatsvr.c for a server-oriented example of this; and despite the name, notes/sockets/server_select.c is a somewhat more client-side-oriented example in that it is select()ing amongst a small, fixed number of sockets.
Q: What value do I use as the timeout for select()? Does it matter?
A: Yes, it matters. Specifically, you must not have a timeout. You want to block until something happens. There is no time limit. Thus the last argument to select() is simply NULL.
Q: Is it ok to use O_NONBLOCK instead of select()?
A: No. Going through all the idle sockets in a loop and checking for activity is called "busy-waiting" -- you really have no work to do, but you're spending a lot of time doing it. This takes away processing power from other processes on the same machine.
Have you ever had a program on your home computer which when it's running, everything else is sluggish, even if that program is just sitting idle? So you see that the computer is sluggish and you go "oh, I must have fooglop running" and you switch to it and quit it and all's well.
The problem with that program was that it was busy-waiting instead of doing something like select(). It's a big problem when a program does this.
I think that the FD_* macros cause some of their confusion simply because of the terminology embodied in their names. They use terminology reminiscent of digital logic (e.g. CSC 258), based on the fact that you are turning on and off bits in a word.
So, the function to set the value to the empty set is FD_ZERO, because it sets the value of that int (or long or whatever it is) to the integer zero, which has all bits zero, which means that no elements are in the set.
The function FD_SET "sets" the bit, meaning that it makes it one. The function FD_CLR "clears" the bit, meaning that it makes it zero.
Similarly, "FD_ISSET" is not checking whether or not something is a mathematical set; it's checking whether a bit is set, which means that the given item is an element of that fdlist.
You may prefer to use more modern abstract-data-type terminology:
I think I may be able to answer some (most?) remaining confusion about the FD_* macros by presenting a sample implementation. This would be put into a system include file, either <unistd.h> or some other file it #includes. This implementation assumes a maximum fd of 31, and that ints are at least 32 bits (so as to hold a bit for each of fds 0, 1, 2, ..., 31).
#define FD_SETSIZE 32 typedef int fd_set; #define FD_ZERO(p) (*(p) = 0) #define FD_SET(fd,p) (*(p) |= 1 << (fd)) #define FD_CLR(fd,p) (*(p) &= ~(1 << (fd))) #define FD_ISSET(fd,p) (*(p) & (1 << (fd)))
On the other hand, the above might not clarify matters if you are a stranger to this "bit-twiddling". I will explain it in person to anyone who is interested, but you don't have to understand it for this course. Basically, the terminology used in the FD_* macros is excessively low-level, and if you are familiar with some of these low-level matters, the above might help bring it together, I hope. If you aren't, that's ok; consider the semantics to be as listed previously (before this sample implementation).
When the other side of a socket connection disconnects, that socket fd is considered by select() to have read-oriented activity, so select() will tell you that there's activity on that fd, and thus you'll do a read, and you'll get zero bytes, and that is the signal that the connection has been dropped. That is to say, check if the read() return value is zero. Negative means error, zero means EOF, and positive indicates the number of data bytes read. (If there are no bytes to be read, read() blocks until there are!)
In your client, when the server sends EOF, that means it has exited or for some reason has hung up on you (which won't happen if you are following the protocol). Just print a simple message to stdout (such as "Server shut down\n") and exit.
Note that you must not trust the server to send appropriately-sized messages. See how the server does not trust the client. Most importantly, no matter what the other side sends, there must be no buffer overrun errors. If someone telnets to the server and types an extremely long line and it gets split in two, fine; if it crashes, not fine. If your program exceeds array bounds, then there is almost certainly a possible input which makes it crash.
Buffer overrun errors are not only a concern for your communication with the
server, but are a general issue.
For example,
the following is wrong, for any size of s:
if (fgets(s, sizeof s, fp) == NULL)
...
strcat(s, ", and the same goes for you!");
The fgets() might produce a string which is of the maximal size to be able to fit into s; so the strcat (potentially) exceeds the array bounds. C programs are only correct if you can prove that array bounds are not exceeded.
Q: What do I put in the fd set passed to select()?
A: You are interested in data on stdin (which is file descriptor 0) or data from the server. So you want both of these file descriptors in your fd_set value. And remember that the first argument to select() is the max of these numbers plus one. Also, you can pre-compute the max of x and 0 for any positive x.
Q: What should I replace [...something in a socket example or some code from the net...] by in my code?
A: No. You should not do this. You should not start with some other program and replace things. You'll never get your program working that way.
You have to start with a blank file and only write things which you want your program to do, for a good reason. Of course you can copy bits of code from elsewhere, but only because you know what they do and you have a good reason for wanting your program to do them.
(And copied code other than from me or the textbook needs to be cited appropriately, of course.)
Q: How do you run gdb if your program needs command-line arguments?
A: The command-line arguments appear as parameters to the "run" command in gdb.
% gdb a.out (gdb) run localhost 1234
Q: How do I connect to the server using telnet?
A: telnet hostname portnumber
e.g. if you are running the server on a computer called "glop" and it is
listening on port number 2345, type "telnet glop 2345".
Or you can use "localhost" if applicable.
Q: How can I "play server" so that my client can connect to me?
A: nc -l portnumber
Q: Do we have to check all system calls for error return values?
A: Yes, pretty much. Do at least as good a job as chatsvr.c does. Note that if you don't check error return values, you won't get good error messages and this will impede your debugging of your program. So put in the error checks from the start.
Q: Is marvin.c an infinite loop?
A: Yes, pretty much. It exits in the case of various kinds of errors, and also upon EOF on stdin.
Q: Once I get data from one file descriptor, I can't seem to receive data from any of the other file descriptors any more.
A: Sometimes this is the result of passing an old return value of the fd list back to select(). When select() returns, it has modified its second parameter (the read fd list); this parameter is an "in/out" parameter. The modified value is not suitable for passing to select() again.
I suggest you put the FD_ZERO and list of FD_SETs in your main loop, just before the select() call. Just set up the fd list every time.
The textbook discusses a different strategy, in which we copy the fd list from a master list every time. This saves the setting-up overhead (or, to be precise, it turns it into a single variable assignment). This is good for a program with a large number of connected sockets, in which the FD_ZERO and list of FD_SETs would be significant. But...
Don't do it. Keep It Simple. The textbook's strategy introduces a situation of what some call "parallel data structures": your other variables which are keeping track of the connection sockets and statuses contain data which is equivalent in semantics to this master fd list. The problem with parallel data structures is that it's easy for them to get out of sync, and then you have a bug. If they remain perfectly synchronized you still have additional program coding requirements resulting from having to maintain the parallel data structures. Parallel data structures should be avoided except when there is a compelling reason for them (which usually in this case would be an efficiency concern). There are only two file descriptors in the set in this program, so don't do it.
Q: I got an unexpected message from my server saying that someone else had connected (or sent something).
A: Someone else did connect (or send something). It could have been anyone on the planet with internet access. Most likely it was another CSC 209 student.
Q: Can I submit a "marvin.h"? Do I #include "parse.c" and "util.c"? Etc.
A: No. Your submitted marvin.c will be compiled like this:
gcc -Wall marvin.c parse.c util.cusing the original versions of chatsvr.h, parse.c, parse.h, util.c, and util.h.
You #include the .h files, but you link with the .c files.
Any declarations you are wanting to put in a marvin.h file should simply go
towards the top of your marvin.c. There will be no marvin.h.
.h files in C are for coordination between files; you are writing only
one .c file.
The C pre-processor just copies the file in when you #include it;
#include doesn't provide anything you can't just do in a single C file.
You may want to copy some code from some of the socket example code presented in lecture. That's fine.
But don't copy any code which you don't understand. If you copy it without understanding it, you will get into trouble later. And there isn't a lot of "later" time to straighten yourself out in!
Q: What is inet_ntoa()?
A: It takes a 32-bit IP address and formats it as a nice string with dots separating the octets. E.g. to encode 192.168.3.28 as a 32-bit number is (((192 * 256 + 168) * 256 + 3) * 256 + 28), or 3232236316. Calling printf("%ld\n", 3232236316) is going to produce this unintelligible output. Calling printf("%s\n", inet_ntoa(3232236316)) will produce a nice "192.168.3.28". In case it helps explain inet_ntoa(), here is a sample implementation.
(Actually, the above calculations assume that the host byte order is "big-endian" -- otherwise, there is a byte order issue, which inet_ntoa() also takes care of, and the sample implementation I provide at that link does this right as well, by looking at the raw bytes rather than using the host "long integer" instructions. On the other hand, my implementation assumes that "long"s are 32 bits and that they are composed of four 8-bit bytes. inet_ntoa() implementations are often platform-specific in this sort of way.)
Q: Could we have the source code for chatclient?
A: Sorry; this is too similar to marvin.c. I'll post it along with the solutions.