Chapter 2
You sometimes read that programs are like cooking
recipes - that they consist of a list of instructions
which the computer carries out one after the other.
Unfortunately this is not really correct in general.
However, in Lua's case this is a useful analogy.
For a notational system to qualify as a
programming language it must provide a way of naming expressions. We call
the names
identifiers or
variables. The word
variable is a bit unfortunate, because the word may refer to something
that does not vary at all. The nomenclature comes from
mathematics, which has a long history, and whose notation and
terminology have evolved to encompass all sorts of quirks and
oddities. To describe a
programming language we have to say how expressions are
built up (
syntax) out of names, what they mean (
denotational semantics), and what happens when
the computer runs a program (
operational semantics). To do all this properly one needs a manual. To learn
the language, however, one needs a piecemeal rather than an exhaustive approach, so that one can practice until one is ready to absorb
something more.
An identifier in Lua consists of letters of the alphabet
(lower case and upper case are considered different), underscores ( _ )
and digits (0 - 9), but may not start with a digit. Furthermore
it must be distinct from a
reserved word . The reserved words are:
and break do else elseif
end false for function if
in local nil not or
repeat return then true until
while
RiscLua lets you use the abbreviation
\ for
function (think of
a
lambda) and
=> for
return . Note that in Basic your
identifiers may not
contain a reserved
word. Lua is more relaxed on this point. Your identifiers must
simply be different from reserved words.
As well as reserved words there are identifiers that have
been assigned in Lua's
base library, such as
print . In theory you can use these
identifiers for your own variables, but then you will overwrite
the values in the base library that they denote, which is
probably not a good idea. They are:
assert collectgarbage dofile
error gcinfo getfenv
getmetatable ipairs loadfile
load loadstring next
pairs pcall print
rawequal rawget rawset
select setfenv setmetatable
tonumber tostring type
unpack xpcall _G
_PROMPT _PROMPT2 _VERSION
RiscLua adds to these:
newtry protect rnd
You should also not use the names of the standard libraries:
coroutine debug io
os package string
table
Standard Lua also has
math . RiscLua adds to these:
block dbl swi
whose purpose is to accommodate Lua to the needs of RISC OS
and the ARM architecture, and also the library
lpeg
the
parser expression grammar library written by Roberto Ierusalimschy.
You might like to try:
x = print
print = 42
x ("Deepthought says ",print)
The word
print is that of a built-in function,
as we have seen, but that does not stop us from redefining it
in the second line as the number 42. In the first line we have
used the word
x to save the value of the print
function, effectively renaming it. The point of this rather
absurd looking exercise is to ram home the distinction between
names and
values. A value may have many names, arbitrarily chosen, so the
tradition of referring to a value by a name can be confusing.
Do we refer to the
print function or to the
x function? Of course, if we use the former more people
will understand which function we are referring to. But it is all
too easy to fall into the trap of thinking that some values come
with god-given, or rather tradition-given, names.
We have written consecutive statements on separate lines.
We could just as well have written
x = print print = 42
for the first two lines. Note that the space between the two
words
print is playing a syntactic role. It tells
the interpreter that it is dealing with two identifiers rather than
one. Multiple spaces can safely be replaced by a single space for
syntactic purposes (outside strings, of course). Many people are
uncomfortable with this sort of thing, but you can use semicolons
to separate or terminate statements
x = print; print = 42;
if you prefer. In this case you do not need the space following
the semicolon, but the text may be more readable with it in.
Two consecutive minus signs (
-- ) introduce
a comment in Lua, much as
REM does in Basic.
The comment is terminated by the next newline. This is convenient
because we can write
--> to denote the output
from a
print statement.
In Basic, identifiers for integers end in % and those for
strings end in $. In Lua any identifier can stand for any type
of value. Lua is a
dynamically typed language; that is to say, variables do not have types, but
values do. This contrasts with
statically typed languages, like C, in which identifiers are tied to types.
The advantage of static typing is that type checking can
be done by the compiler, so that, when the program is run, time is
not wasted on type checking. However, compiled languages have
their own disadvantages, particularly for scripting.
The function
type when applied to an identifier
returns a string describing the type of its value.
print(type(nil)) --> nil
print(type(true)) --> boolean
print(type(42)) --> number
print(type("42")) --> string
print(type(type)) --> function
print(type(lpeg.R("az","AZ"))) --> userdata
The possible types in Lua are:
nil boolean number string
userdata function thread table
which we will discuss later. In RiscLua the number type
refers to 32-bit integers. The
userdata type is a catch-all that the implementor can use
to create what are in effect new types; RiscLua uses it for
64-bit floating point numbers (
doubles) and for word-aligned arrays of four-byte words (
blocks).
Note that in Lua, unlike Basic, functions are values.
Functions in Lua are not tied to unique names, as they
are in Basic. You can let an identifier stand for a function
just as you can let it stand for a number or a string.
We will meet plenty of occasions when it is convenient in
Lua to use functions without giving them a name; that is
not possible in Basic.
Basic and C are called
first order languages because they distinguish passive data
(zeroth order values) like numbers or strings from active
data, i.e. functions (first order values) which transform
passive data to passive data. Lua, on the other hand, is a
higher order language, because functions can
transform functions, and have no privileged status. Composition
of functions is an example of a higher order function. We can
write it in Lua as the function:
\(f,g) => \(x) => f(g(x)) end end
Higher order languages have been around since the earliest days
of computing, but first order languages have always been more
widespread. If you have not encountered higher order languages
before you may well underestimate just how much more expressive
they can be.
To describe a program as a list of instructions is too flat
an oversimplification. A Lua program consists of a list of
chunks. A chunk consists of statements and chunks -
that is, chunks can be enclosed within chunks. This means that
a program is not so much a
list of instructions
as a
tree of instructions.
By contrast, Basic's structure is rather different - it consists
of a
main section which comes first and is terminated by
the word
END, followed by function/procedure
definitions, initiated by the word
DEF, which
are not allowed to be nested.
The notion of chunk is absolutely fundamental to Lua. Because
it is a notion that is not required in Basic, those who come to Lua
with only a background in Basic may perhaps not appreciate initially
just why it is so important. The reason is that it controls the
scope of variables and lets you exercise the important
discipline of
information hiding, which is required
for
modularizing a program into re-usable
self-contained units.
In Basic a variable used in a function/procedure definition can
be declared
LOCAL to it. Without this facility it
would be impossible to write reusable function/procedure definitions.
In Lua a variable can be declared
local to a chunk.
The body of a function is a chunk. But in Lua the body of
any control structure (
while, repeat, for, if ) is also
a chunk. So is code written in a separate file, that can be loaded
into the program. You can make any collection of statements into a
chunk by surrounding them with the words
do and
end . There is a speed advantage to using local rather
than global variables in Lua, and in Basic too. So the use of
chunks in Lua is not just a stylistic matter; it is more efficient.
In immediate mode each completed statement is treated as a
separate chunk.
This limits the usefulness of immediate mode, so we will pay more
attention from now on to programs written in a file.
Quite a lot of chunks are terminated by the keyword
end . I find it convenient sometimes to append a comment
describing what sort of chunk is being terminated.
if condition then . . . .
end -- if
while condition do . . . .
end -- while
for loopvars in iterator do . . . .
end -- for
\ ( parameters ) . . . .
end -- function
Loop variables are automatically local to a for-chunk.
Parameters are automatically local to a function-chunk.
Both these kinds of chunk will be explained in more detail later.
A
global variable , one not declared local
to a chunk, has global scope; that is, its value can be accessed
anywhere in the program apart from where it is overridden by a
local variable with the same name. The principle of
lexical scoping , which Lua adheres to, says that
the scope of a local variable extends from the statement
after that in which it is declared local to
the end of the innermost chunk in which the declaration occurs,
including all the subchunks of that chunk which follow the
declaration . In terms of our picture of the program as a
tree, this means that the scope of a local variable is all of
the tree that is to the right of and higher than its declaration.
Of course, a local variable may be overridden by another local
variable of the same name whose declaration occurs in its scope.
We often need to count things; words, linenumbers etc.
To specify a counter, i.e. a function giving a new count value each time
it is called, we need to give the start value, and the
value by which the counter goes up each time. Here is a way to
do it if the start value is 0 and the increment is 1:
do -- start the chunk
local count = 0
counter = \() -- function taking no arguments
local n = count
count = count + 1
=> n
end -- function
end -- end the chunk
The variable
count , being local to the chunk,
is not accessible outside it. But the function
counter is not local to the chunk, and so is accessible outside it. Each time
it is called it produces a value that is greater by one than
the previous time. So we have used the chunk to
hide the
count variable from the rest of the program.
It does not matter if the same name is used elsewhere in the program;
the interpreter will treat it separately without confusion.
If we wanted
counter to be local to some enclosing
chunk, rather than be a global variable, we would declare it
local inside the enclosing chunk and before the chunk shown above.
Of course, we would not stick the word
local in
front of its definition because that would give it the wrong scope.
A variable that appears in a chunk which is neither a global variable,
nor declared as local within it (and so has been declared local in some
enclosing chunk) is called an
upvalue for the chunk. In the body of
counter the variable
n is local, and the variable
count is an upvalue.
Alternatively we can define a
counter factory as a function whose values are counters.
make_counter = \ (start,increment)
local count = start
=> \ ()
local n = count
count = count + increment
=> n
end -- function
end -- function
odds = make_counter (1,2)
by10 = make_counter (0,10)
Each of the two counters
odds and
by10 acquires its own private copy of the
upvalue
count . So we see that an upvalue
is not to be thought of as a single value, which is
stored in a single place. Each call to
make_counter will bring into being a separate
closure consisting, in this case, of an inaccessible count variable
and a counter function to update it.