When we consider a character in a string we have two pieces of information: what is the character's ASCII code, and what is the index of its position. In a word what and where. In our case that is 256 times #text-many bits and so we are dealing with two the power of that for the number of statements that we can say about the string. This can be a large number. However we can look at some useful families of statements.
Properties of characters
An array of 256 Boolean values determines a character-class. For example, the upper-case characters, the control-characters, and so on. Given a character-class and an index i we may ask whether the i-th character belongs to that class. So each class determines a pattern which succeeds on the longest prefix of the text whose characters belong to that class.
The largest character-class, that contains any character, is often denoted by a dot.
[ABC... ]Is often used for the union of the character classes A,B,C,... and
[^ABC... ]For the negation of that union.
If s is a string, we have the pattern that matches only those texts that have s as a prefix.
match (P(s), text) --> true, text + #s if text starts with s --> false otherwiseA frequent abuse of notation is to call P(s) simply s. More important than individual patterns is to consider how patterns can be combined.
We can combine patterns in a sequence, each one starting where the previous one left off. This is often thought of as multiplication so notations like A*B or A B are often used. It is associative (but not commutative). If A fails then A*B fails. If A succeeds and
match (A, text) = true, text'then
match (A*B, text) = match (B, text')So if B fails A*B will fail but will still update the text pointer to the first character after the prefix on which A succeeds.
If n is a non-negative integer then A^n matches when a sequence of at least A's match, and A^-n matches when a sequence of at most A's match. The standard notations are
A* <--> A^0 A+ <--> A^1 A? <--> A^-1
!A succeeds only when A fails. The pointer is not updated. Note that !!! has the same effect as !.