back | contents |

In Chapter 9 we met the facilities in the ` string ` library for matching and catching patterns. These, while
often convenient, are sometimes rather limited when it comes
to searching for alternatives. Furthermore, patterns are there
treated simply as strings; an approach that necessitates
the use of an escape character, ` % `, which leads
to expressions that are usually hard to read. The ` lpeg ` library, by contrast, provides a more powerful
and expressive notion of pattern. It treats patterns as
objects in their own right, which can be combined and manipulated
with an algebra that uses
the standard symbols for arithmetic.

The whole purpose of a *pattern* is that it should
be matched against a string. An assignment

x, y, ... = p:match (s, i)where

From here on we suppose that we are in the lexical environment
of ` lpeg `. Here is a simple example.

p = P "beau"defines

p:match "beauty" --> 5The letter in the 5-th position follows the matched prefix.

In the following ` A ` and ` B ` denote
lpeg patterns.

A + BWhen

A - BWhen

-AThis pattern succeeds when

A * BThis pattern is matched by first matching

A^nIf

We have seen above how the function ` P ` converts
a string to the pattern that matches that string. In fact it can take
many other kinds of argument. So

P (true) -- always matches P (false) -- always fails P (n) --[[ if n is a non-negative integer matches strings of at least n characters. If n is negative, matches strings that do not have -n characters.]] P (s) -- matches string s P (p) -- is pattern p P (f) --[[ f is a function of two variables. When this pattern is matched against a string the function f is called with first argument the string and second the pointer position. The match succeeds if f returns a number that is a further position in the string, and the pointer is updated to it; otherwise the match fails.]] P (t) --[[ when t is a table it is interpreted as a grammar. The keys of the table denote nonterminal symbols. The leading rule is given by t[1] if that is a pattern and by t[ t[1] ] otherwise.]]The pattern

Here are some other pattern-creating functions.

R (r1, ... ) --[[ the arguments are 2-character strings interpreted as ranges of characters.]]So, for example,

(P "a") + (P "b") + (P "c") + (P "d")The function

vowel = S "aeiouy"The notation can be greatly simplified by using the fact that the arithmetic operations will coerce all their other arguments to patterns, using

(p + P (1))^1We can read this as

somewhere = \ (p) => (p + 1)^1 end -- functionso that

Inside a table defining a grammar, the expression ` V (k) ` denotes the pattern defining the nonterminal
symbol ` k `. This is not evaluated until the pattern
given by the grammar is matched. Grammars provide a way
of defining patterns recursively.

When a pattern is matched against a string, the pointer
sweeps through the string, and one can demand of the pattern that
various pieces of information be picked up, processed and returned
as a *capture*. The two basic sorts of information
one might want correspond to the questions *where?* and *what?*. For the former the pattern

Cp ( )matches an empty string and returns the pointer value as a capture. For example

do local P, Cp in lpeg s = "To lave the feet of the Zatcoon" somewhere = \ (x) => (x+1)^1 end p = somewhere (Cp ( ) * "Zatcoon") print (p:match (s)) --> 25 endFor the latter the function

Cdecorates a pattern by making it capture the matching substring.

p = 16 * C (7) * "?" print (p:match "The feet of the Zatcoon?") --> ZatcoonThere is actually a third basic kind of datum - something ad hoc that does not necessarily have any connection with the string. The pattern

Cc (x)matches an empty string and returns the value

Then we have functions which simply process the captures of a pattern.
If ` p ` is a pattern and ` f ` is a function,
the expression ` p / f ` is the pattern modified so that its
captures are obtained as the values returned when the function is
applied to the original captures.

p = (16 * C (7) * "?") / string.upper print (p:match "The feet of the Zatcoon?") --> ZATCOONAlternatively

The function ` Ct ` replaces all the captures of a
pattern by a table containing those captures.

The function ` Cs ` replaces a pattern by substituting its nested
captures into the matching substrings of the string it is applied to.

The function ` Ca ` creates an *accumulator
capture*. Its argument should produce at least one capture,
which becomes the initial value of the accumulator. Any subsequent function
captures are called with the accumulator as first argument, followed
by any other arguments provided, and the returned value becomes the
new value of the accumulator. The final value of the accumulator is the
single value returned as the capture when the resulting pattern matches.

local R, Ca in lpeg local number = R "09"^1 / tonumber local add = \ (acc, x) => acc + x end local sum = Ca (number * ("," * number / add)^0) print (sum:match "1,4,9,16,25") --> 55

For the full details see the Lpeg manual.

back | contents |