Grammars

It is worth giving a special name for patterns that do not match empty strings. I will call them lerts. When these are matched the text pointer is always updated to a further position.

Patterns can be combined and repeated, so that they form an algebra. With all algebra it pays to name things and use variables, to avoid complicated expressions and to aid comprehension. So we want to make definitions for patterns

      X = A B C ...
Such names are also called non-terminals. For example

     string_literal = dquote [^dquote]* dquote
If we consider assembly language code to implement patterns we can see that, with due care, combining patterns in a sequence can be made to correspond to concatenating their code. Repetition can be implemented with a loop whose body calls the code to be repeated as a subroutine. Non-terminals will be represented by labels. We can also have guarded recursion, meaning that the pattern expression to the left of the recursed non-terminal is a lert. This means that each time that its subroutine is entered the text pointer has moved further along, so infinite loops are avoided.

An array of 256 bytes will provide 8 character classes, one for each bit. With Boolean operations on these, represented by using bitmasks, we get more; perhaps a sufficiency for most purposes.

To proceed further we will need to introduce backtracking.