awk is a small, elegant programming language for processing structured text, typically line-oriented data with fields separated by whitespace or another delimiter. It was created at Bell Labs in 1977 by Alfred Aho, Peter Weinberger, and Brian Kernighan (hence the name). An awk program consists of pattern { action } pairs: for each input line matching the pattern, the action is executed.
awk '{ print $1 }' file # first field
awk -F: '{ print $1, $7 }' /etc/passwd # colon-delimited
awk '$3 > 1000 { print $1 }' /etc/passwd # lines with condition
awk '/error/ { count++ } END { print count }' logfile
awk 'NR==10' file # print line 10
awk 'BEGIN { FS=","; OFS="\\t" } { print $1, $3 }' data.csv
Variables like NR (record number), NF (number of fields), FS (field separator), and $0 (whole line) are built in. BEGIN and END blocks run before the first and after the last input line, useful for headers, footers, and totals.
Modern versions include gawk (the GNU implementation, on Linux), mawk (fast minimal awk, common in Debian as /usr/bin/awk), and nawk (the canonical "new awk"). For quick column extraction, sums, averages, and reshaping of tabular text, awk is hard to beat.
Discussed in:
- Chapter 8: Text Processing — awk: A Miniature Language
Also defined in: Textbook of Linux