Frequently Asked Question

What is awk and what is it actually for?

awk is a small programming language designed specifically for processing columnar text, lines of records separated by some delimiter (whitespace by default), where each line splits into fields you can address as $1`, `$2, $3, and so on. $0 is the whole line, NF is the number of fields, NR is the record number. The language was created at Bell Labs in 1977 by Aho, Weinberger, and Kernighan, those three initials are the name.

An awk program is a series of pattern { action } rules. The pattern selects which lines the action applies to; the action is a small C-like block. Patterns can be regular expressions (/error/ { print $1 }) or boolean expressions on fields ($3 > 100 { print $1, $3 }). Special patterns BEGIN and END run before the first line and after the last, perfect for printing headers and summaries. For quick column manipulation, ad-hoc summing, and tabular reformatting, awk is often quicker than reaching for Python.

Video

Further reading and video