Frequently Asked Question

Why is parsing the output of ls bad?

ls is designed to format directory entries for humans, not for programs. Two specific problems make parsing its output dangerously unreliable. First, Unix allows almost any character in a filename, including spaces, tabs, newlines, backslashes, and even backslash-n that looks like a newline. A loop like for f in $(ls); do rm "$f"; done breaks the moment you have a file called my report.txt (split into two arguments) or one with a literal newline in it (split at that newline). Second, ls sometimes transforms its output for display, replacing non-printable bytes with question marks, adding colour codes, making the bytes you read different from the real filename.

The correct alternatives all bypass ls entirely. To loop over files matching a pattern, use bash's glob: for f in *.log; do ...; done. To recurse, use find, ideally with -print0 and a while IFS= read -r -d '' loop, or with -exec. To get filenames into an array, bash 4+ has mapfile with find -print0. ShellCheck warns on ls parsing via SC2012 and SC2207, heed it.

Video

Further reading and video