Pipes, Redirection, and Streams

Q: What is process substitution and how does <(...) differ from a pipe?

Process substitution, written cmd, lets you treat the output or input of a command as if it were a regular file. Under the hood bash creates a pipe, starts cmd connected to one end, and substitutes a path like /dev/fd/63 into the command line where you wrote cmd is symmetric: cmd1 tee cmd2 cmd3 sends one stream to two consumers in parallel. It is bash and zsh; not POSIX, and not in dash.

Dr Chris Paton

If the shell is the engine of Linux productivity, then pipes are the fuel lines. Every seasoned Unix user will at some point construct a one-liner that, in a single sentence of commands joined by | and >, extracts exactly the information they want from a terabyte of log files. This is the technique that separates people who use Linux from people who wield it. The concepts in this chapter are simple (really, there are only about six of them), but combining them is where the magic happens.

Learning Objectives

Describe the three standard streams and how processes read and write them
Use redirection operators to send input and output to and from files
Compose commands with pipes to build short but powerful one-liners
Separate and combine standard output and standard error cleanly
Explain the Unix philosophy of small composable tools and apply it in practice

The Three Standard Streams

Every process in Linux starts life with three file descriptors already open. Standard input is where a process reads its input from. Standard output is where it writes its normal results. Standard error is where it writes diagnostic messages and error reports. The key insight is that these are separate channels, and they can be independently redirected.

Why separate stdout and stderr? So that you can redirect normal output to a file while still seeing error messages on your screen. If everything went to the same channel, filtering and piping would mix up results with diagnostics in unpredictable ways.

Table 7.1: The three standard streams

FD	Name	Symbol	Default
0	Standard input	stdin	Keyboard
1	Standard output	stdout	Terminal
2	Standard error	stderr	Terminal

The Unix Philosophy

Doug McIlroy, who invented pipes in 1972, summarised the philosophy like this:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

This is the Unix philosophy in four sentences. Rather than building one gigantic program that can do everything, Unix provides many small programs, each solving one problem, and a glue, the pipe, that lets you chain them together. sort does nothing but sort. uniq does nothing but remove consecutive duplicates. wc counts things. On their own, each is almost trivial. Combined, they can do remarkable work.

Table 7.2: The Unix philosophy (McIlroy, paraphrased)

#	Maxim
1	Write programs that do one thing and do it well.
2	Write programs to work together.
3	Write programs to handle text streams: the universal interface.
4	Build a prototype as soon as possible; throw it away if needed.
5	Choose portability over efficiency.

Output Redirection

The simplest form of redirection sends standard output to a file instead of the terminal:

ls /etc > files.txt

The > operator replaces the contents of files.txt with whatever ls prints. If the file does not exist, it is created. If it does exist, its previous contents are silently destroyed: there is no warning.

To append instead of overwriting, use >>:

date >> log.txt

Now each run adds a new line to log.txt without disturbing what was already there.

Table 7.3: Redirection operators reference

Operator	Meaning
> file	Redirect stdout to file (truncate)
>> file	Redirect stdout to file (append)
< file	Take stdin from file
2> file	Redirect stderr to file
2>> file	Append stderr to file
&> file	Redirect both stdout and stderr (bash)
> file 2>&1	Redirect both (POSIX form)
2>&1	Send stderr to wherever stdout is going
\| command	Pipe stdout into another command
\|& command	Pipe stdout + stderr into command (bash)
<<EOF ... EOF	Here-document (multiline stdin)
<<<"text"	Here-string (single-line stdin)
<(cmd)	Process substitution as input file
>(cmd)	Process substitution as output file

Input Redirection

The mirror-image operator < takes input from a file:

wc -l < /etc/passwd
# 42

Here, instead of wc opening /etc/passwd itself, the shell opens it and connects it to wc's standard input. In this particular case the result is the same, but the distinction matters when wc is part of a pipeline: it means wc does not need to know anything about files, it just reads from stdin.

There is also a here document (<<), which lets you feed literal text into a command:

cat <<EOF
This is line one.
This is line two.
EOF

The shell reads everything up to the terminator EOF and feeds it to cat as standard input. Here documents are immensely useful in scripts for embedding configuration files, SQL queries, or long strings.

A close cousin, the here string (<<<), feeds a single string:

grep "foo" <<< "this foo line matches"
# this foo line matches

Redirecting Standard Error

Standard error has file descriptor 2, so you redirect it with 2>:

find / -name "*.log" 2> errors.txt

This runs find starting from /, writing found file names to the terminal as usual but silently capturing any "Permission denied" messages in errors.txt.

A very common idiom is to send stderr to the void so it does not clutter the output:

find / -name "*.log" 2>/dev/null

/dev/null is a special device that accepts any input and discards it. Think of it as a digital black hole.

Merging Streams

Sometimes you want to capture both stdout and stderr together. The syntax is:

command > output.txt 2>&1

The 2>&1 means "redirect file descriptor 2 to wherever file descriptor 1 currently points". It must come after the > redirection, because the order matters: first stdout is pointed at the file, then stderr is pointed at the same place as stdout.

There is a shorthand:

command &> output.txt

which does the same thing. This is a bash extension, not POSIX, but it is so convenient that almost everybody uses it.

If you want to see output on the terminal and save it to a file, use tee:

make 2>&1 | tee build.log

The tee command (named after the plumber's T-junction) reads stdin, writes it unchanged to stdout, and also to any files you name. It is how you keep a live terminal view while still capturing everything.

Pipes

Now the headline act. A pipe connects the standard output of one command directly to the standard input of another, without any intermediate file.

ls /etc | wc -l
# 127

The shell creates a pipe (an in-memory buffer), starts ls, and connects its stdout to the pipe's input end. Then it starts wc -l with the pipe's output end as its stdin. Both commands run concurrently: ls pushes names into the pipe as fast as it generates them, and wc reads and counts them on the other side. When ls finishes, its end of the pipe closes, wc sees end-of-file, prints the count, and exits.

Pipes are implemented by the kernel as a small circular buffer, usually 64 KB on Linux. If the consumer is slower than the producer, the producer blocks until there is room. This natural flow control means you can pipe gigabytes through a sequence of commands without running out of memory.

You can chain as many as you like:

cat /var/log/syslog | grep ERROR | awk '{print $5}' | sort | uniq -c | sort -rn | head

Read this one left to right:

Print the entire syslog.
Keep only the lines that mention "ERROR".
Print the fifth whitespace-separated field of each.
Sort alphabetically (so identical values come together).
Collapse duplicates and prepend a count.
Sort numerically in reverse order (largest counts first).
Show the top 10.

In one line, you have a summary of the most common error-producing programs in your syslog. This kind of ad-hoc analysis is what pipes were made for.

Common Pipeline Partners

A handful of commands come up again and again as pipeline components. You will meet all of them properly in Chapter 8, but here is a preview.

Table 7.4: Common pipe patterns

Pipeline	What It Does
ls \| wc -l	Count entries in current directory
ps aux \| grep nginx	Find processes matching a pattern
cat file \| sort \| uniq -c \| sort -rn	Frequency-sorted tally of lines
history \| awk '{print $2}' \| sort \| uniq -c \| sort -rn \| head	Most-used commands
find . -type f \| xargs grep TODO	Grep all files under a tree
dmesg \| tail -20	Last 20 kernel messages
du -sh * \| sort -h	Directory sizes, smallest to largest
command > out.log 2>&1 &	Run in background, capture both streams

Table 7.5: Classic 'small tools' used in pipelines

Tool	Role
head / tail	Take first / last N lines
sort	Sort lines (-n numeric, -r reverse, -u unique)
uniq	Remove or count adjacent duplicates
wc	Count lines / words / bytes
cut	Select columns by delimiter or byte
tr	Translate or delete characters
grep	Filter lines by pattern
sed	Stream edit (substitute, delete)
awk	Column-aware mini-language
tee	Write stream to file AND pass through
xargs	Convert stdin into command arguments

xargs: Bridging Stdout and Arguments

Some commands do not read stdin at all; they take arguments on the command line. For example, rm expects filenames as arguments, not as lines on stdin. To feed it pipe output, use xargs:

find . -name "*.tmp" | xargs rm

This runs find, feeds its output (one filename per line) into xargs, which in turn builds a command like rm file1 file2 file3 ... and runs it. xargs is tremendously useful, but filenames with spaces or newlines can confuse it. The safe idiom is:

find . -name "*.tmp" -print0 | xargs -0 rm

which uses null bytes as separators (-print0 on find, -0 on xargs) and tolerates any filename.

Exit Codes and Pipe Semantics

Every command returns an exit code when it finishes: 0 for success, anything else for failure. You can check the last one with $?:

ls /nonexistent
# ls: cannot access '/nonexistent': No such file or directory
echo $?
# 2

In a pipeline, each command has its own exit code, but by default the shell only exposes the last one. To be safer, you can enable pipefail:

set -o pipefail
grep foo file | wc -l

Now the pipeline's exit code is the rightmost non-zero code, so a failure anywhere in the chain is visible. This matters enormously in shell scripts.

Process Substitution

A less well-known but useful feature is process substitution, which lets you use a command's output as if it were a file:

diff <(ls dir1) <(ls dir2)

Here <(ls dir1) expands to something like /dev/fd/63, a file descriptor that diff can open and read from, backed by ls dir1 running in the background. This lets you compare command outputs directly without creating temporary files.

Pipes in Practice

To close this chapter, here are three real-world pipelines worth absorbing.

Count the ten most common words in a text file:

cat book.txt | tr -c '[:alnum:]' '\n' | sort | uniq -c | sort -rn | head

Show which users are consuming the most disk space in /home:

du -sh /home/* 2>/dev/null | sort -h | tail

Find the processes using the most memory:

ps aux --sort=-%mem | head

Each of these is under a hundred characters. Each does something that would take dozens of lines in most other languages. Pipes are the Unix superpower, and they pay back an hour spent learning them with a lifetime of saved time.