Frequently Asked Question
What is comm, and how is it different from diff?
comm compares two sorted files line by line and prints three columns: lines
only in the first file, lines only in the second, and lines in both. By
suppressing columns with -1, -2, or -3 you can ask narrower questions.
comm -12 a b prints only lines that appear in both (set intersection).
comm -23 a b prints lines that are in a but not in b (set difference).
comm -3 a b prints lines that are in exactly one of them (symmetric
difference).
The crucial difference from diff is the model. diff cares about ordered
differences, it tells you "between line 17 and line 18 of file A, insert these
lines from file B", and it can show context. comm does pure set operations
on lines, ignoring order beyond the sort, and is much faster on large inputs
because it just does a merge walk. Use diff to track changes to ordered text
(source code, prose); use comm to ask set-theoretic questions about
unstructured lists (hostnames, usernames, IDs).