The PDF document is a scanned copy of the first ever paper that was published on the Unix shell. The paper was written by Ken Thompson and published in Structured Programming (Infotech state of the art report).
The scanned copy was first obtained from Ken Thompson by wesleyneo and shared on archive.org. I combined the scanned images into a single PDF document and shared it here. Also, wesleyneo transcribed the paper to text format and I further edited and reformatted it to Markdown format. The HTML document is automatically generated from the Markdown file.
All of these files are shared on the Internet with permission from Ken Thompson.
A bit of proofreading caught some small errors which were not in the original:
- "This this invokes" -> "Thus this invokes"
- "the standard output had occurred" -> "the standard output that has occurred"
- "without typing up a console" -> "without tying up a console"
- "The Shell, as command" -> "The Shell, as a command"
- "late a night" -> "late at night"
- "source programs has something" -> "source program has something"
- "which states its contents" -> "which states it contents" [sic]
- "In sincerely" -> "I sincerely"
"A program is generally exponentially complicated by the number of notions that it invents for itself. To reduce this complication to a minimum, you have to make the number of notions zero or one, which are two numbers that can be raised to any power without disturbing this concept. Since you cannot achieve much with zero notions, it is my belief that you should base systems on a single notion."
It's always fascinating to read this type of stuff with the knowledge the authors influenced the way people use technology in such a fundamental way. It's like the inventor of the screwdriver modestly writing about how their curious little tool can be used.
Apart from the "goto" command and a different "if" statement syntax, all of the examples would run in a modern Bash shell; not much has changed fundamentally.
There seem to be no comments, and the argument for the no-op ":" is used for this purpose instead.
Using the argument of ":" as the label for the "goto" command reminded me of sed, which does exactly the same (and probably was influenced by it):
sed ': label
s/x/y/
/x/b label'
loops over the "s/x/y/" command until the pattern space doesn't match "/x/" any longer. (Just for illustration - this could be replaced by a single global substitution, of course.)
"goto" and labels are also the replacement for all the looping constructs we're used to today.
The differentiation between source/filter/sink types of programs makes so much sense, and while maybe obvious, I've never seen it described this clearly. Descriptions of the composability of Unix programs often focus on the filter aspect of programs.
I'd be curious to know what some of "missing" concepts were. Obviously they turned out not to be necessary.
> Many familiar computing ‘concepts’ are missing from UNIX. Files have no records. There are no access methods. User programs contain no system buffers. There are no file types.
One of the more interesting parts of Unix is that there was not a system level way to have records. This was especially wierd given other systems in use at the time spent a lot of time dealing with this exact issue. Just the other day there was a discussion here on HN about file system APIs, and many of the comments were about this peculiarity of Unix: why isn't there one true way to store stuff that is smaller than a file? The answer is probably, well, different uses of data require different things smaller than a file. There isn't really one true way at that point. Also, storage media changes a lot over time.
Coming to Unix from another (larger than a PC) OS was strange because it felt so stripped-down, yet capable, largely because of the extreme power the shell gave you to concatenate functionality. I've never read this particular document, an it does really explain this capability well.
Well, you've quoted a sentence that rattles off several of them. In prior operating systems, files were structured objects, often readable and writeable only in collections of fixed-length records; they often had attributes including a "type" (Windows file name extensions are a vestige of this), and had I/O APIs requiring a particular memory layout in program memory for data to be read or written. There were also, in some cases, files that could be accessed only through limited forms of database query ("access methods" -- see, e.g., ISAM for Indexed Sequential Access Method), and not as raw byte streams, or streams of records.
Current mainframe programming environments retain a lot of this structure. Here's a writeup by someone trained on more Unix-derived environments who was self-teaching on the '60s legacy stuff: https://medium.com/the-technical-archaeologist/hello-world-o...
That is not at all obvious. There's a reason SQL is a thing. There's a reason that file name extensions are ubiquitous. There's a reason that security is a hot mess. It turns out that you really do need all these things, and if you don't provide them in the OS, then they will need to be provided at the user level.
why would you believe that ? maybe computing would be 10x less painful today than it is if these missing concepts had been implemented. Or maybe 10x more.
And these people were not always happy[1] with the way things are in UNIX. An example of something missing from UNIX is a service or job subsystem, which is why SMF, launchd and systemd were invented.
Probably referring to examples like VAX/VMS filesystems? I never worked on them much, and only very shallowly. Fortran file I/O has vestiges of record-oriented interfaces still in the standard, I'm assuming from co-evolution with file systems of the time.
> 'rewinds' its standard input. It then reads the standard input looking for the syntax of a : command with an argument matching it own. It leaves the standard input positioned after the : command. When control is returned to the shell, execution continues where the standard input was positioned. (This implies certain things about file positioning and open file sharing that will not be discussed here.)
Oh man, `:` makes so much sense now. It was originally meant to be like a label.
I think the clarity and brevity of this is amazing. Should be mandatory reading for teaching to people starting in computing.
I’ve always thought that the concept of piping input to output and the shell is on of the most powerful “programming environments” around and to see its origins is humbling!
Around 1978, while in high school, I spent many afternoons at the local university library skimming through six-inch-thick bound stacks of computer software journals. (That was how people communicated, before Internets.)
I distinctly remember encountering exactly this paper, as my first exposure to Unix. It made a deep, deep impression. Pipe composition is just unaccountably powerful.
What might not be so easy for people to understand today is how profoundly shocking it was to see computer commands and results expressed in lower case. I had never seen it done. Ever. It didn't seem possible, at first. After a few seconds, it was obvious, but still oddly liberating. You don't know computers are shouting at you until, suddenly, they aren't.
It’s not really subtle I think. When you redirect to a file, the file is opened for writing by the shell before it runs the program that you are invoking.
This is also the same reason that if you try something like the following:
grep whatever somefile >somefile
You always end up with somefile being empty, whereas what you thought you were doing was replacing the contents of the file with only the lines in the file that contain the word “whatever”. But instead the shell will truncate the file first and then grep is searching through an already empty file and therefore there is nothing to find and the file ends up empty.
and wondered why it was taking so long. The shell (probably csh) had created the file "out" before it expanded the *, so my grep command was forever finding "string" in it's own output. It filled the disk before I realized what was happening. I just did a quick test and bash seems to create the output file after doing the globbing, and under tcsh, grep gave a warning that the output file was also an input file and it did not search it.
All these demonstrations of piping make me wonder how and why POSIX file naming conventions became so loose (https://dwheeler.com/essays/fixing-unix-linux-filenames.html). Were things like whitespace in filenames just completely unimaginable at the time? Was colon-separated PATH not yet conceived of?
PDF: https://susam.github.io/tucl/the-unix-command-language.pdf
HTML: https://susam.github.io/tucl/the-unix-command-language.html
The PDF document is a scanned copy of the first ever paper that was published on the Unix shell. The paper was written by Ken Thompson and published in Structured Programming (Infotech state of the art report).
The scanned copy was first obtained from Ken Thompson by wesleyneo and shared on archive.org. I combined the scanned images into a single PDF document and shared it here. Also, wesleyneo transcribed the paper to text format and I further edited and reformatted it to Markdown format. The HTML document is automatically generated from the Markdown file.
All of these files are shared on the Internet with permission from Ken Thompson.