Also known as: regex, regexp
A regular expression (regex) is a formal pattern language for describing sets of strings. Regexes originated in Stephen Kleene's work on regular languages in the 1950s and found their first programming home in Ken Thompson's 1968 implementation for the QED editor. Today they are a universal tool for search, substitution, and validation of text across almost every programming language.
Basic syntax (POSIX extended, roughly):
. any character
* zero or more of the previous
+ one or more
? zero or one
^ start of line
$ end of line
[abc] any of a, b, c
[^abc] none of a, b, c
(foo|bar) alternation
\\b word boundary
Linux has several regex dialects, frustratingly not all equivalent: Basic Regular Expressions (BRE) used by plain grep and sed, Extended Regular Expressions (ERE) used by grep -E and egrep, and Perl-Compatible Regular Expressions (PCRE) used by grep -P, Perl, Python, and most modern tools. PCRE is the most powerful but not always available; BRE/ERE are standardised by POSIX.
Tools for learning and debugging regexes include regex101.com, pcregrep, and simply reading the man 7 regex page. A healthy respect for regex complexity is wise: the distinction between greedy and lazy quantifiers, lookaheads, and backtracking catastrophes can turn a one-liner into a subtle bug.
Discussed in:
- Chapter 8: Text Processing — grep: Searching for Patterns
Also defined in: Textbook of Linux