Frequently Asked Question
What does wc count, and what counts as a word?
wc (word count) prints, by default, three numbers for each input: lines,
words, and bytes. wc -l file prints only lines (the commonest use), -w only
words, -c only bytes, and -m only characters (which differs from bytes once
multi-byte UTF-8 is in play). Given multiple files, wc also prints a total
line at the end.
A word in wc's sense is any maximal run of non-whitespace characters. So
hello-world is one word, hello world is two, and one two (with two
spaces) is still two. That definition is simple and language-agnostic, which is
why wc happily counts source code, English prose, and Chinese (where the word
count will be wrong, Chinese has no spaces between words, but the character
count will be right). For document counts, use wc -m; for source-code line
counts, wc -l over the relevant files; for the size in bytes including
newlines, wc -c.