Patterns

When we consider a character in a string we have two pieces of information: what is the character's ASCII code, and what is the index of its position. In a word what and where. In our case that is 256 times #text-many bits and so we are dealing with two the power of that for the number of statements that we can say about the string. This can be a large number. However we can look at some useful families of statements.

Properties of characters

An array of 256 Boolean values determines a character-class. For example, the upper-case characters, the control-characters, and so on. Given a character-class and an index i we may ask whether the i-th character belongs to that class. So each class determines a pattern which succeeds on the longest prefix of the text whose characters belong to that class.

The largest character-class, that contains any character, is often denoted by a dot.

The notation

[ABC... ]Is often used for the union of the character classes A,B,C,... and

[^ABC... ]For the negation of that union.

String Patterns

If s is a string, we have the pattern that matches only those texts that have s as a prefix.

match (P(s), text) --> true, text + #s if text starts with s --> false otherwiseA frequent abuse of notation is to call P(s) simply s. More important than individual patterns is to consider how patterns can be combined.

Sequence

We can combine patterns in a sequence, each one starting where the previous one left off. This is often thought of as multiplication so notations like A*B or A B are often used. It is associative (but not commutative). If A fails then A*B fails. If A succeeds and

match (A, text) = true, text'then

match (A*B, text) = match (B, text')So if B fails A*B will fail but will still update the text pointer to the first character after the prefix on which A succeeds.

Repetition

If n is a non-negative integer then A^n matches when a sequence of at least A's match, and A^-n matches when a sequence of at most A's match. The standard notations are

A* <--> A^0 A+ <--> A^1 A? <--> A^-1

Negation

!A succeeds only when A fails. The pointer is not updated. Note that !!! has the same effect as !.