When we consider a character in a string we have two pieces of information: what is the character's ASCII code, and what is the index of its position. In a word what and where. In our case that is 256 times #text-many bits and so we are dealing with two the power of that for the number of statements that we can say about the string. This can be a large number. However we can look at some useful families of statements.
Properties of characters
An array of 256 Boolean values determines a character-class. For example, the upper-case characters, the control-characters, and so on. Given a character-class and an index i we may ask whether the i-th character belongs to that class. So each class determines a pattern which succeeds on the longest prefix of the text whose characters belong to that class.
The largest character-class, that contains any character, is often denoted by a dot.
The notation
[ABC... ]
Is often used for the union of the character classes A,B,C,...
and
[^ABC... ]
For the negation of that union.
String Patterns
If s is a string, we have the pattern that matches only those texts that have s as a prefix.
match (P(s), text) --> true, text + #s if text starts with s
--> false otherwise
A frequent abuse of notation is to call P(s) simply s. More important
than individual patterns is to consider how patterns can be combined.
Sequence
We can combine patterns in a sequence, each one starting where the previous one left off. This is often thought of as multiplication so notations like A*B or A B are often used. It is associative (but not commutative). If A fails then A*B fails. If A succeeds and
match (A, text) = true, text'
then
match (A*B, text) = match (B, text')
So if B fails A*B will fail but will still update the text pointer to the first character
after the prefix on which A succeeds.
Repetition
If n is a non-negative integer then A^n matches when a sequence of at least A's match, and A^-n matches when a sequence of at most A's match. The standard notations are
A* <--> A^0
A+ <--> A^1
A? <--> A^-1
Negation
!A succeeds only when A fails. The pointer is not updated. Note that !!! has the same effect as !.