Frequently Asked Question

What does wc count, and what counts as a word?

wc (word count) prints, by default, three numbers for each input: lines, words, and bytes. wc -l file prints only lines (the commonest use), -w only words, -c only bytes, and -m only characters (which differs from bytes once multi-byte UTF-8 is in play). Given multiple files, wc also prints a total line at the end.

A word in wc's sense is any maximal run of non-whitespace characters. So hello-world is one word, hello world is two, and one two (with two spaces) is still two. That definition is simple and language-agnostic, which is why wc happily counts source code, English prose, and Chinese (where the word count will be wrong, Chinese has no spaces between words, but the character count will be right). For document counts, use wc -m; for source-code line counts, wc -l over the relevant files; for the size in bytes including newlines, wc -c.

Further reading and video