Frequently Asked Question

What is comm, and how is it different from diff?

comm compares two sorted files line by line and prints three columns: lines only in the first file, lines only in the second, and lines in both. By suppressing columns with -1, -2, or -3 you can ask narrower questions. comm -12 a b prints only lines that appear in both (set intersection). comm -23 a b prints lines that are in a but not in b (set difference). comm -3 a b prints lines that are in exactly one of them (symmetric difference).

The crucial difference from diff is the model. diff cares about ordered differences, it tells you "between line 17 and line 18 of file A, insert these lines from file B", and it can show context. comm does pure set operations on lines, ignoring order beyond the sort, and is much faster on large inputs because it just does a merge walk. Use diff to track changes to ordered text (source code, prose); use comm to ask set-theoretic questions about unstructured lists (hostnames, usernames, IDs).

Further reading and video