Regular expressions

In Computer Science, a regular expression is a string that is used to describe sets, or subsets of character strings, using certain syntactic rules. Regular expressions represent a kind of filter criterion for texts, by matching the respective regular expression with the text. For example, it is possible to match all the words beginning with S and ending with d without having to explicitly specify the intervening letters. The regular expression for that would be: S.*d

  • Characters which are directly matched are also directly specified.

  • An arbitrary character is represented by a .

  • A character selection is represented by [abc]. A range can also be specified, e.g. by [a-zA-Z]

Quantifiers can be used for the preceding expression, to be allowed in a variety of multiplicities in the string.


?

The preceding expression is optional. It may occur once, but it does not have to, i. e. the expression does not occur or once.

+

The preceding expression must occur at least once, but it may also occur several times.

*

The preceding expression may occur any number of times (including zero times).

{n}

The preceding expression must occur exactly n times.

{min,}

The preceding expression must occur at least min times.

{,max}

The preceding expression can occur a maximum of max times.

{min,max}

The preceding expression must occur at least min times and can occur a maximum of max times.


Regular expressions always work line by line. If a regular expression is to go over several lines of text, the operator (?s) must be prefixed.

Tools


There is a plugin to support the development of regular expressions. On the Internet, you will find numerous tutorials and tips to learn more about regular expressions.