sort reads lines and prints them in sorted order. By default it sorts lexicographically, using the current locale's collation rules. Options change the ordering criterion:
sort file.txt # lexicographic
sort -n file.txt # numeric
sort -h file.txt # human-readable (2K, 1M, 1G)
sort -r file.txt # reverse
sort -u file.txt # unique (like sort | uniq)
sort -k2 file.txt # sort by field 2
sort -t: -k3 -n /etc/passwd # numeric sort by UID
sort -R file.txt # random shuffle
sort --parallel=4 big.txt # parallel sort
sort is often used as the second half of a pipeline, after text extraction: cut -d: -f1 /etc/passwd | sort | uniq -c | sort -rn | head gives a kind of tally. It can merge already-sorted files with sort -m, which is useful for log processing. For very large inputs it will spill to temporary files automatically, so it can sort data much larger than RAM.
Locale matters: LC_ALL=C sort uses byte-order sorting, which is faster and more predictable than locale-aware sorting. For anything involving non-ASCII data it is worth thinking about which behaviour you want.
Discussed in:
- Chapter 8: Text Processing — Cutting, Translating, and Counting
Also defined in: Textbook of Linux