- Describe the three standard streams and how processes read and write them
- Use redirection operators to send input and output to and from files
- Compose commands with pipes to build short but powerful one-liners
- Separate and combine standard output and standard error cleanly
- Explain the Unix philosophy of small composable tools and apply it in practice
If the shell is the engine of Linux productivity, then pipes are the fuel lines. Every seasoned Unix user will at some point construct a one-liner that, in a single sentence of commands joined by | and >, extracts exactly the information they want from a terabyte of log files. This is the technique that separates people who use Linux from people who wield it. The concepts in this chapter are simple — really, there are only about six of them — but combining them is where the magic happens.
The Three Standard Streams
Every process in Linux starts life with three file descriptors already open. Standard input is where a process reads its input from. Standard output is where it writes its normal results. Standard error is where it writes diagnostic messages and error reports. The key insight is that these are separate channels, and they can be independently redirected.
Why separate stdout and stderr? So that you can redirect normal output to a file while still seeing error messages on your screen. If everything went to the same channel, filtering and piping would mix up results with diagnostics in unpredictable ways.
The Unix Philosophy
Doug McIlroy, who invented pipes in 1972, summarised the philosophy like this:
Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
This is the Unix philosophy in four sentences. Rather than building one gigantic program that can do everything, Unix provides many small programs, each solving one problem, and a glue — the pipe — that lets you chain them together. sort does nothing but sort. uniq does nothing but remove consecutive duplicates. wc counts things. On their own, each is almost trivial. Combined, they can do remarkable work.
Output Redirection
The simplest form of redirection sends standard output to a file instead of the terminal:
ls /etc > files.txt
The > operator replaces the contents of files.txt with whatever ls prints. If the file does not exist, it is created. If it does exist, its previous contents are silently destroyed — there is no warning.
To append instead of overwriting, use >>:
date >> log.txt
Now each run adds a new line to log.txt without disturbing what was already there.
Input Redirection
The mirror-image operator < takes input from a file:
wc -l < /etc/passwd
# 42
Here, instead of wc opening /etc/passwd itself, the shell opens it and connects it to wc's standard input. In this particular case the result is the same, but the distinction matters when wc is part of a pipeline — it means wc does not need to know anything about files, it just reads from stdin.
There is also a here document (<<), which lets you feed literal text into a command:
cat <<EOF
This is line one.
This is line two.
EOF
The shell reads everything up to the terminator EOF and feeds it to cat as standard input. Here documents are immensely useful in scripts for embedding configuration files, SQL queries, or long strings.
A close cousin, the here string (<<<), feeds a single string:
grep "foo" <<< "this foo line matches"
# this foo line matches
Redirecting Standard Error
Standard error has file descriptor 2, so you redirect it with 2>:
find / -name "*.log" 2> errors.txt
This runs find starting from /, writing found file names to the terminal as usual but silently capturing any "Permission denied" messages in errors.txt.
A very common idiom is to send stderr to the void so it does not clutter the output:
find / -name "*.log" 2>/dev/null
/dev/null is a special device that accepts any input and discards it. Think of it as a digital black hole.
Merging Streams
Sometimes you want to capture both stdout and stderr together. The syntax is:
command > output.txt 2>&1
The 2>&1 means "redirect file descriptor 2 to wherever file descriptor 1 currently points". It must come after the > redirection, because the order matters: first stdout is pointed at the file, then stderr is pointed at the same place as stdout.
There is a shorthand:
command &> output.txt
which does the same thing. This is a bash extension, not POSIX, but it is so convenient that almost everybody uses it.
If you want to see output on the terminal and save it to a file, use tee:
make 2>&1 | tee build.log
The tee command (named after the plumber's T-junction) reads stdin, writes it unchanged to stdout, and also to any files you name. It is how you keep a live terminal view while still capturing everything.
Pipes
Now the headline act. A pipe connects the standard output of one command directly to the standard input of another, without any intermediate file.
ls /etc | wc -l
# 127
The shell creates a pipe — an in-memory buffer — starts ls, and connects its stdout to the pipe's input end. Then it starts wc -l with the pipe's output end as its stdin. Both commands run concurrently: ls pushes names into the pipe as fast as it generates them, and wc reads and counts them on the other side. When ls finishes, its end of the pipe closes, wc sees end-of-file, prints the count, and exits.
Pipes are implemented by the kernel as a small circular buffer, usually 64 KB on Linux. If the consumer is slower than the producer, the producer blocks until there is room. This natural flow control means you can pipe gigabytes through a sequence of commands without running out of memory.
You can chain as many as you like:
cat /var/log/syslog | grep ERROR | awk '{print $5}' | sort | uniq -c | sort -rn | head
Read this one left to right:
- Print the entire syslog.
- Keep only the lines that mention "ERROR".
- Print the fifth whitespace-separated field of each.
- Sort alphabetically (so identical values come together).
- Collapse duplicates and prepend a count.
- Sort numerically in reverse order (largest counts first).
- Show the top 10.
In one line, you have a summary of the most common error-producing programs in your syslog. This kind of ad-hoc analysis is what pipes were made for.
Common Pipeline Partners
A handful of commands come up again and again as pipeline components. You will meet all of them properly in Chapter 8, but here is a preview.
xargs: Bridging Stdout and Arguments
Some commands do not read stdin at all — they take arguments on the command line. For example, rm expects filenames as arguments, not as lines on stdin. To feed it pipe output, use xargs:
find . -name "*.tmp" | xargs rm
This runs find, feeds its output (one filename per line) into xargs, which in turn builds a command like rm file1 file2 file3 ... and runs it. xargs is tremendously useful, but filenames with spaces or newlines can confuse it. The safe idiom is:
find . -name "*.tmp" -print0 | xargs -0 rm
which uses null bytes as separators (-print0 on find, -0 on xargs) and tolerates any filename.
Exit Codes and Pipe Semantics
Every command returns an exit code when it finishes: 0 for success, anything else for failure. You can check the last one with $?:
ls /nonexistent
# ls: cannot access '/nonexistent': No such file or directory
echo $?
# 2
In a pipeline, each command has its own exit code, but by default the shell only exposes the last one. To be safer, you can enable pipefail:
set -o pipefail
grep foo file | wc -l
Now the pipeline's exit code is the rightmost non-zero code, so a failure anywhere in the chain is visible. This matters enormously in shell scripts.
Process Substitution
A less well-known but useful feature is process substitution, which lets you use a command's output as if it were a file:
diff <(ls dir1) <(ls dir2)
Here <(ls dir1) expands to something like /dev/fd/63, a file descriptor that diff can open and read from, backed by ls dir1 running in the background. This lets you compare command outputs directly without creating temporary files.
Pipes in Practice
To close this chapter, here are three real-world pipelines worth absorbing.
Count the ten most common words in a text file:
cat book.txt | tr -c '[:alnum:]' '\n' | sort | uniq -c | sort -rn | head
Show which users are consuming the most disk space in /home:
du -sh /home/* 2>/dev/null | sort -h | tail
Find the processes using the most memory:
ps aux --sort=-%mem | head
Each of these is under a hundred characters. Each does something that would take dozens of lines in most other languages. Pipes are the Unix superpower, and they pay back an hour spent learning them with a lifetime of saved time.