Software tools
Discussed non-software tools and how easily you can combine them, how
multi-purpose they are.
You can hit just about anything with a hammer.
In contrast, software is often presented as monolithic large programs
which indeed do have many features, but if the implementers didn't happen to
think of the feature you wanted, it's not easily added.
- You may have a great text editor and a great spreadsheet program and
great programs to do all sorts of particular applications, but what you
lack is an easy way to combine them, analogous to using the hammer in
conjunction with another tool.
- Any tool or device is very limited if it can't be used in ways
the creator didn't imagine. We would like software components to
be as fundamental and general as a hammer.
The "software tools" idea is about writing small, simple programs, which
do one thing well; and having powerful and general ways to combine them.
- In unix, we can connect programs using a "pipe", which is a kernel
data structure which is a buffer but has the special property that it can
be used for i/o redirection.
- "command >file" runs "command" but with its standard output
redirected into the file "file", as you know.
- "cmd1 | cmd2" runs both commands, with the standard output of cmd1
connected into the pipe,
and the standard input of cmd2 connected so as to come out of the
pipe.
- Given how many unix tools are designed as "filters", willing to read
from their standard input and write to their standard output,
you can make quite a lot of interesting combinations.
- "cmd1 | cmd2" is itself a command which can be piped into another
command, and so on. You can have a long pipeline, like a factory assembly
line, in which each command performs some successive transformation on the
data.
- Examples:
- given a text file called "faculty"
containing names of department faculty members, not in order,
one per line,
- "sort faculty" yields a sorted file on stdout.
- "sort -k2 faculty" sorts by last name instead of first name
- "sort -f faculty" is a case-insensitive sort
- you can combine these options by saying "sort -f -k2"
and
- "sed" is a "stream editor" -- it performs transformations on
the data as it goes by, like a filter, without modifying the file
on disk
- the input language to sed is complex and we're not going to
go into it in this course, but I'd like to use it for this example
- the sed command I'll be using is "s", for "substitute" -- it
does a search and replace, but using the unix "regular expression"
syntax. Regular expressions are based on the theory of finite
automata and are a powerful way to describe sets of strings, such
as "all strings containing the letter e" or "all strings
containing a letter, five spaces, and then either a digit or the
symbol '?'", etc.
- The 's' command letter is followed by a slash (actually you
can use any character as delimiter, but slash is usual... unless
the search string has slashes in it), then the search string, then
a slash, then the replacement string, then another slash, then any
option key letters (we didn't use any of those in the examples in
lecture).
- Altogether we get a command like "sed 's/oo/ooo/'" to
transform all double-os into triple-os.
- A more serious transformation can be described by the complex
substitution "sed 's/\(.*\) \(.*\)/\2, \1/'".
The backslashed-parentheses indicate grouping. Dot (".") matches
any character. Asterisk means zero-or-more of the preceding.
So altogether the search string means zero or more of any
character (i.e. anything at all), then a space, then zero or more
of any character. But the parenthesized groups can be referred to
in the replacement string. The replacement string means "the
second thing we matched in the search, a comma, a space, then the
first thing we matched in the search". Altogether this transforms
the line "Alan Rosenthal" into the line "Rosenthal, Alan".
- Example: sed 's/\(.*\) \(.*\)/\2, \1/' faculty
- So we have two useful tools so far: sort and sed.
- We can combine them with a pipe, to transform the faculty list
into a list which has last name first and is sorted.
- Example: sort -f -k2 faculty | sed 's/\(.*\) \(.*\)/\2, \1/'
- The sort options are affected by which order we do these
transformations in; if we've already done the "sed", then we want to
sort by the zeroth field (the last name in the sed-transformed file)
as opposed to the oneth field (the last name in the original file).
- Note that all of these "filters" have optional file name
arguments: If you supply a list of file names, it reads from those.
If you don't supply any file names, it reads its standard input.
Cute quotation: "Unix is user-friendly; it's just choosy about who its friends are."
Summary: "Do one thing well."
Tools which do just one thing can be combined in arbitrary ways.
One thing a bit odd in unix is that program output doesn't contain headers.
Consider the "who" command. Example output:
ajr console Jan 8 06:28
ajr ttyp1 Jan 8 09:25
ajr ttyp2 Jan 8 09:26
(The "who" output is more exciting on a system with multiple users, especially
if no one's on the console and creating multiple terminal windows;
try it on cslinux or seawolf (as appropriate to the campus you're taking this
course on...).)
We can see how many entries there are by using the "word count" program "wc",
with the option "-l" which means "only display the line count":
% who | wc -l
3
%
On many non-unix systems we would expect output with a header, identifying the
columns, like this:
User Terminal Login time
------------------------------
ajr console Jan 8 06:28
ajr ttyp1 Jan 8 09:25
ajr ttyp2 Jan 8 09:26
But this would cause problems for the software tools model.
In the "who | wc -l" case,
the line count above would be off by two;
in fact we would get funny results from many tools.
For example, a "grep" (display only lines matching a search expression)
to see who is logged in and has a "-" in their logname would also display the
header separation line, or if a user were named "ogi", then "who | grep ogi"
would also display the header line.
Software tools
- do one thing well
- are small
- interface cleanly
Software tools principles, after Doug McIlroy (with
some text from Ian Darwin):
- Write small programs that do one thing well.
- To do a new job, build afresh rather than complicate old
programs by adding new "features".
- Don't put everything into the OS.
- Expect the output of every program to become the input to another, as yet
unknown, program.
- Don't clutter output with extraneous information. Say what you're asked to -- no more, no less.
- Make programs' input formats easy to generate or type.
- Avoid stringently columnar or binary input formats.
- Don't insist on interactive input. Wherever possible, programs should be able to process data from their standard input to their standard output.
- Supply good defaults.
If every file has the same format, users only need one set of tools. If the format is simple, the tools are easy to write.
If everything in the system is a file, users can go further with one set of tools.
- Use programs to write programs.
- Use high-level languages.
- Use regular expressions for all pattern matching.
Don't force people to use the system in one way.
Filters
"filter"
e.g. grep: print lines which match a pattern.
- grep Z faculty
- who | grep ajr
Some ways data goes INTO a command:
- standard input
- command-line arguments -- often file names, opened and processed like
standard input; allows multiple files; if a "filter" won't do that, then
cat file1 file2 file3 | filter
- environment variables (setenv). More later maybe.
(After we've done processes
and execve(), you can have a look at man 7 environ.)
command-line: "globbing" done by the shell.
- '*' matches any number of any character.
- '?' matches any one character.
- [list of chars]
- [range]
- a[1-4].pdf
- use [a-z] to match any lower-case letter
- combine them: [a-xz] matches any lower-case letter except 'y'
Special treatment of '.' at the beginning of a file name: must be matched
explicitly.
Here are some filters. All of these, and everything else, has man pages. Get
used to reading man pages, especially to find obscure options.
I frequently read man pages. The on-line help in unix is very comprehensive.
There's a lot to know and you don't have to remember it all.
grep
who | grep ajr
grep /~ajr/209/ /var/httpd/log/access_log
lpq | grep ajr | cut -f1 | xargs lprm
tr
tr '\015' '\012' <file.mac >file.unix
tr A-Z a-z
tr a-zA-Z n-za-mN-ZA-M
head, tail
last | head
tail /var/log/messages
tail -40 /var/log/messages
sort
sort
sort -k2
sort -n
sort -n -k3
lots of other options such as case-insensitive, reverse
uniq
tr -cs a-zA-Z0-9 '\012' <file | tr A-Z a-z | sort | uniq -c
sed
s/Fred/Wilma/
s/Fred/Wilma/g
s/Fred[a-z]*/Wilma/g
5d, 10q, /pat/d
regular expressions: ., [, *
Here are some other fundamental unix tools:
echo
provide output, e.g.
echo Please enter repeat count:
echo -n 'Please enter repeat count: '
-> note how it takes any number of arguments, outputs them separated by spaces.
Use "tr" to convert x's to y's in xylophone:
- can't say tr x y xylophone (even if it took files)
- so: echo xylophone | tr x y
cat
various options depending on unix version, such as -n to
number the lines, -s to eliminate multiple blank lines; note that a
plain "cat" is just a buffer, used as a data-wise no-op
(cat actually is a filter, could be in the section above)
ls
ls dir or ls file; ls -d to avoid descending into a directory
use xargs to make it read stdin in any interesting way
-a, -l, -i, -q, -t, -r
how options combine: ls -lart
ls strangely (and unsimply) acts differently by default based on whether
its output is a "tty" or not, but there are options -C to force columnar
output and -1 to force one file per line (mnemonic: "one column")
cp
either 2 args or multi args plus directory; -p, -r
mv
similar options, always -p
rm
-r, -f
cmp
, also cmp -l
diff
, also diff -b, also -c
comm
-> students enrolled in CSC 209 before and after the drop date (fictional)
% comm -1 students newstudentlist
% comm -12 students newstudentlist
% comm -13 students newstudentlist
% comm -23 students newstudentlist
join
join newstudentlist grades
idea of "-" file name
->
Summary: small programs that do one thing well.
Find
find /u/ajr/web/270/example -name mergen.c -print
find /u/ajr/web/270/notes -mtime -30 -exec ls -ld '{}' ';'
find /u/ajr/web/270/notes -type f -mtime -30 -exec ls -ld '{}' ';'
A little more about the shell
Further understanding of command-line arguments through attempting to cat
a file called "-a"
% cat -a
cat: illegal option -- a
usage: cat [-benstuv] [-] [file ...]
%
Note that the "-" detection is lexical; file names have more complex
semantics.
So use another path name which refers to the same file, but does not have the
property that the zeroth character of the string is '-'.
Example 1: cat ./-a
Example 2: cat /u/ajr/-a
Another method:
There is a feature of getopt(), the library function used to parse the
command-line options, to say "that's all the options", after which you can
safely say "-a" to refer to a file named "-a".
cat -- -a
Trivial shell scripts via "sh file":
"sh" (the shell) will take file names as
arguments just like all those filters above, and will process the contents of
the file just like typing it in.
Example shell script which compiles gcd.c
and tests it with several arguments:
gcc -Wall gcd.c
./a.out 3 4
./a.out 12 0
./a.out 0 12
./a.out 12 18
./a.out 18 12
If this is in a file "testgcd", execute the list of commands by saying
"sh testgcd".
I/O redirection:
- ./a.out >file
- sh testgcd >file
- i/o redirection is "inherited" when the sh process runs the other
processes
- echo hello >hi
- echo foo; echo bar; echo baz >fbb
- Only "echo baz" is redirected.
- It's an operator precedence issue (precedence of ";" versus ">").
- The shell implements parentheses to change the precedence!
- (echo foo; echo bar; echo baz) >fbb
- All of this also applies to pipes and pipelines, e.g. you can have a
parenthesized command sequence as one element in a pipeline.
[list of topics covered]
[course notes available so far]
[main course page]