Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Unix Command Language (1976) (github.com/susam)
147 points by susam on Nov 28, 2020 | hide | past | favorite | 33 comments


Direct links to PDF (scanned images) and HTML (generated from Markdown):

PDF: https://susam.github.io/tucl/the-unix-command-language.pdf

HTML: https://susam.github.io/tucl/the-unix-command-language.html

The PDF document is a scanned copy of the first ever paper that was published on the Unix shell. The paper was written by Ken Thompson and published in Structured Programming (Infotech state of the art report).

The scanned copy was first obtained from Ken Thompson by wesleyneo and shared on archive.org. I combined the scanned images into a single PDF document and shared it here. Also, wesleyneo transcribed the paper to text format and I further edited and reformatted it to Markdown format. The HTML document is automatically generated from the Markdown file.

All of these files are shared on the Internet with permission from Ken Thompson.


A bit of proofreading caught some small errors which were not in the original:

    - "This this invokes" -> "Thus this invokes"
    - "the standard output had occurred" -> "the standard output that has occurred"
    - "without typing up a console" -> "without tying up a console"
    - "The Shell, as command" -> "The Shell, as a command"
    - "late a night" -> "late at night"
    - "source programs has something" -> "source program has something"
    - "which states its contents" -> "which states it contents" [sic]
    - "In sincerely" -> "I sincerely"


Fixed in https://github.com/susam/tucl/commit/b2abf75eefb34601bc38618... (commit b2abf75).

Thank you for proofreading the Markdown document.


In the HTML version: s/21:26:39/21:36:39/ to agree with the PDF & spoken text.


Fixed in commit https://github.com/susam/tucl/commit/56429d7fa941670cd6ff7c4... (commit 56429d7).

Thank you for reporting this error.


The first paragraph is a gem:

"A program is generally exponentially complicated by the number of notions that it invents for itself. To reduce this complication to a minimum, you have to make the number of notions zero or one, which are two numbers that can be raised to any power without disturbing this concept. Since you cannot achieve much with zero notions, it is my belief that you should base systems on a single notion."


It's always fascinating to read this type of stuff with the knowledge the authors influenced the way people use technology in such a fundamental way. It's like the inventor of the screwdriver modestly writing about how their curious little tool can be used.


A few things that stood out to me:

Apart from the "goto" command and a different "if" statement syntax, all of the examples would run in a modern Bash shell; not much has changed fundamentally.

There seem to be no comments, and the argument for the no-op ":" is used for this purpose instead.

Using the argument of ":" as the label for the "goto" command reminded me of sed, which does exactly the same (and probably was influenced by it):

    sed ': label
         s/x/y/
         /x/b label'
loops over the "s/x/y/" command until the pattern space doesn't match "/x/" any longer. (Just for illustration - this could be replaced by a single global substitution, of course.)

"goto" and labels are also the replacement for all the looping constructs we're used to today.

The differentiation between source/filter/sink types of programs makes so much sense, and while maybe obvious, I've never seen it described this clearly. Descriptions of the composability of Unix programs often focus on the filter aspect of programs.


you can also use `t` command instead of `/x/b` or `//b`

    $ echo 'coffining' | sed ':d s/fin//; td'
    cog
    $ echo 'coffining' | sed ':d s/fin//; //b d'
    cog


I'd be curious to know what some of "missing" concepts were. Obviously they turned out not to be necessary.

> Many familiar computing ‘concepts’ are missing from UNIX. Files have no records. There are no access methods. User programs contain no system buffers. There are no file types.


One of the more interesting parts of Unix is that there was not a system level way to have records. This was especially wierd given other systems in use at the time spent a lot of time dealing with this exact issue. Just the other day there was a discussion here on HN about file system APIs, and many of the comments were about this peculiarity of Unix: why isn't there one true way to store stuff that is smaller than a file? The answer is probably, well, different uses of data require different things smaller than a file. There isn't really one true way at that point. Also, storage media changes a lot over time.

Coming to Unix from another (larger than a PC) OS was strange because it felt so stripped-down, yet capable, largely because of the extreme power the shell gave you to concatenate functionality. I've never read this particular document, an it does really explain this capability well.


Well, you've quoted a sentence that rattles off several of them. In prior operating systems, files were structured objects, often readable and writeable only in collections of fixed-length records; they often had attributes including a "type" (Windows file name extensions are a vestige of this), and had I/O APIs requiring a particular memory layout in program memory for data to be read or written. There were also, in some cases, files that could be accessed only through limited forms of database query ("access methods" -- see, e.g., ISAM for Indexed Sequential Access Method), and not as raw byte streams, or streams of records.

Current mainframe programming environments retain a lot of this structure. Here's a writeup by someone trained on more Unix-derived environments who was self-teaching on the '60s legacy stuff: https://medium.com/the-technical-archaeologist/hello-world-o...


> Obviously they turned out not to be necessary.

That is not at all obvious. There's a reason SQL is a thing. There's a reason that file name extensions are ubiquitous. There's a reason that security is a hot mess. It turns out that you really do need all these things, and if you don't provide them in the OS, then they will need to be provided at the user level.


> Obviously they turned out not to be necessary.

why would you believe that ? maybe computing would be 10x less painful today than it is if these missing concepts had been implemented. Or maybe 10x more.


There exist people who have used both.


And these people were not always happy[1] with the way things are in UNIX. An example of something missing from UNIX is a service or job subsystem, which is why SMF, launchd and systemd were invented.

[1] https://web.mit.edu/~simsong/www/ugh.pdf


And it seems that a lot of people aren't happy with it, either.


Microsoft Power Shell reintroduced records (in pipelines, not files), for what it’s worth.

I think the big problem with records is that it opens a huge can of worms.

How should we represent dates? Binary data? The type of a binary file? Mime types?

All the answers to these questions should be designed to age gracefully over the next half century or so.


Probably referring to examples like VAX/VMS filesystems? I never worked on them much, and only very shallowly. Fortran file I/O has vestiges of record-oriented interfaces still in the standard, I'm assuming from co-evolution with file systems of the time.


> The command

> goto argument

> 'rewinds' its standard input. It then reads the standard input looking for the syntax of a : command with an argument matching it own. It leaves the standard input positioned after the : command. When control is returned to the shell, execution continues where the standard input was positioned. (This implies certain things about file positioning and open file sharing that will not be discussed here.)

Oh man, `:` makes so much sense now. It was originally meant to be like a label.


DOS' COMMAND.COM and later Windows' cmd.exe works the same way.


I think the clarity and brevity of this is amazing. Should be mandatory reading for teaching to people starting in computing.

I’ve always thought that the concept of piping input to output and the shell is on of the most powerful “programming environments” around and to see its origins is humbling!


Around 1978, while in high school, I spent many afternoons at the local university library skimming through six-inch-thick bound stacks of computer software journals. (That was how people communicated, before Internets.)

I distinctly remember encountering exactly this paper, as my first exposure to Unix. It made a deep, deep impression. Pipe composition is just unaccountably powerful.

What might not be so easy for people to understand today is how profoundly shocking it was to see computer commands and results expressed in lower case. I had never seen it done. Ever. It didn't seem possible, at first. After a few seconds, it was obvious, but still oddly liberating. You don't know computers are shouting at you until, suddenly, they aren't.


The "ls >output" example exposes a subtle artifact of the implementation: the output file will contain its own name.


It’s not really subtle I think. When you redirect to a file, the file is opened for writing by the shell before it runs the program that you are invoking.

This is also the same reason that if you try something like the following:

    grep whatever somefile >somefile
You always end up with somefile being empty, whereas what you thought you were doing was replacing the contents of the file with only the lines in the file that contain the word “whatever”. But instead the shell will truncate the file first and then grep is searching through an already empty file and therefore there is nothing to find and the file ends up empty.


Back in my BSD-on-a-VAX days, I once typed

  grep string * > out
and wondered why it was taking so long. The shell (probably csh) had created the file "out" before it expanded the *, so my grep command was forever finding "string" in it's own output. It filled the disk before I realized what was happening. I just did a quick test and bash seems to create the output file after doing the globbing, and under tcsh, grep gave a warning that the output file was also an input file and it did not search it.


Thank you! I've searched for this three years ago, and I think it wasn't available on the Internet back then.


Related: The Origin of the Shell, discussed two days back https://news.ycombinator.com/item?id=25207957


All these demonstrations of piping make me wonder how and why POSIX file naming conventions became so loose (https://dwheeler.com/essays/fixing-unix-linux-filenames.html). Were things like whitespace in filenames just completely unimaginable at the time? Was colon-separated PATH not yet conceived of?


The clarity of this documentation is really impressive. It’s like I was rediscovering the shell again.


In the section THE SHELL AS A COMMAND, shouldn't the

    sh >file
be

    sh <file

?


This was fixed a few minutes ago: https://github.com/susam/tucl/pull/7.

Please see the PDF document at https://susam.github.io/tucl/the-unix-command-language.pdf for a scanned copy of the original paper. If you find an error in the Markdown or HTML transcript, please create an issue or send a pull request to https://github.com/susam/tucl.


Such clarity and beauty in writing! I pray that in my lifetime I may be granted just one percent of that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: