Glossary

sort

sort reads lines and prints them in sorted order. By default it sorts lexicographically, using the current locale's collation rules. Options change the ordering criterion:

sort file.txt                     # lexicographic
sort -n file.txt                  # numeric
sort -h file.txt                  # human-readable (2K, 1M, 1G)
sort -r file.txt                  # reverse
sort -u file.txt                  # unique (like sort | uniq)
sort -k2 file.txt                 # sort by field 2
sort -t: -k3 -n /etc/passwd       # numeric sort by UID
sort -R file.txt                  # random shuffle
sort --parallel=4 big.txt         # parallel sort

sort is often used as the second half of a pipeline, after text extraction: cut -d: -f1 /etc/passwd | sort | uniq -c | sort -rn | head gives a kind of tally. It can merge already-sorted files with sort -m, which is useful for log processing. For very large inputs it will spill to temporary files automatically, so it can sort data much larger than RAM.

Locale matters: LC_ALL=C sort uses byte-order sorting, which is faster and more predictable than locale-aware sorting. For anything involving non-ASCII data it is worth thinking about which behaviour you want.

Related terms: uniq, cut, awk

Discussed in:

Also defined in: Textbook of Linux