uniq reads input and produces output in which adjacent duplicate lines have been collapsed to one. The key word is "adjacent": uniq does not sort its input, so you almost always want to pipe it through sort first to group duplicates together.
sort file.txt | uniq # deduplicate
sort file.txt | uniq -c # prefix each line with its count
sort file.txt | uniq -d # show only duplicates
sort file.txt | uniq -u # show only unique lines
sort access.log | uniq -c | sort -rn | head # top offenders
The idiom sort | uniq -c | sort -rn is so common it deserves memorising. It produces a frequency-sorted histogram of whatever is fed into it—perfect for "which IP addresses hit me the most?" or "which errors appear most often in the log?"
For simple deduplication, sort -u does the same work in one step and is faster. uniq's advantage is its counting and filtering flags, and its ability to compare only certain fields or skip characters, which come in handy for structured input.
Discussed in:
- Chapter 8: Text Processing — Cutting, Translating, and Counting
Also defined in: Textbook of Linux