title logo

back contents forward

Chapter 2

what are programs?

You sometimes read that programs are like cooking recipes - that they consist of a list of instructions which the computer carries out one after the other. Unfortunately this is not really correct in general. However, in Lua's case this is a useful analogy.

For a notational system to qualify as a programming language it must provide a way of naming expressions. We call the names identifiers or variables. The word variable is a bit unfortunate, because the word may refer to something that does not vary at all. The nomenclature comes from mathematics, which has a long history, and whose notation and terminology have evolved to encompass all sorts of quirks and oddities. To describe a programming language we have to say how expressions are built up (syntax) out of names, what they mean (denotational semantics), and what happens when the computer runs a program (operational semantics). To do all this properly one needs a manual. To learn the language, however, one needs a piecemeal rather than an exhaustive approach, so that one can practice until one is ready to absorb something more.

identifiers

An identifier in Lua consists of letters of the alphabet (lower case and upper case are considered different), underscores ( _ ) and digits (0 - 9), but may not start with a digit. Furthermore it must be distinct from a reserved word . The reserved words are:

and        break      do       else        elseif
end        false      for      function	   if
in         local      nil      not         or
repeat     return     then     true        until
while
RiscLua lets you use the abbreviation \ for function (think of a lambda) and => for return . Note that in Basic your identifiers may not contain a reserved word. Lua is more relaxed on this point. Your identifiers must simply be different from reserved words.

As well as reserved words there are identifiers that have been assigned in Lua's base library, such as print . In theory you can use these identifiers for your own variables, but then you will overwrite the values in the base library that they denote, which is probably not a good idea. They are:

   assert          collectgarbage      dofile
   error           gcinfo              getfenv
   getmetatable    ipairs              loadfile
   load            loadstring          next
   pairs           pcall               print
   rawequal        rawget              rawset
   select          setfenv             setmetatable
   tonumber        tostring            type
   unpack          xpcall              _G
   _PROMPT         _PROMPT2            _VERSION

RiscLua adds to these:

   newtry          protect             rnd

You should also not use the names of the standard libraries:

   coroutine      debug               io
   os             package             string
   table

Standard Lua also has math . RiscLua adds to these:

   block          dbl                 swi

whose purpose is to accommodate Lua to the needs of RISC OS and the ARM architecture, and also the library

                      lpeg

the parser expression grammar library written by Roberto Ierusalimschy.

You might like to try:

  x = print
  print = 42
  x ("Deepthought says ",print)
The word print is that of a built-in function, as we have seen, but that does not stop us from redefining it in the second line as the number 42. In the first line we have used the word x to save the value of the print function, effectively renaming it. The point of this rather absurd looking exercise is to ram home the distinction between names and values. A value may have many names, arbitrarily chosen, so the tradition of referring to a value by a name can be confusing. Do we refer to the print function or to the x function? Of course, if we use the former more people will understand which function we are referring to. But it is all too easy to fall into the trap of thinking that some values come with god-given, or rather tradition-given, names.

We have written consecutive statements on separate lines. We could just as well have written

  x = print  print = 42

for the first two lines. Note that the space between the two words  print  is playing a syntactic role. It tells the interpreter that it is dealing with two identifiers rather than one. Multiple spaces can safely be replaced by a single space for syntactic purposes (outside strings, of course). Many people are uncomfortable with this sort of thing, but you can use semicolons to separate or terminate statements

   x = print;  print = 42; 
if you prefer. In this case you do not need the space following the semicolon, but the text may be more readable with it in.
comments

Two consecutive minus signs (  --  ) introduce a comment in Lua, much as REM does in Basic. The comment is terminated by the next newline. This is convenient because we can write --> to denote the output from a print statement.

types

In Basic, identifiers for integers end in % and those for strings end in $. In Lua any identifier can stand for any type of value. Lua is a dynamically typed language; that is to say, variables do not have types, but values do. This contrasts with statically typed languages, like C, in which identifiers are tied to types. The advantage of static typing is that type checking can be done by the compiler, so that, when the program is run, time is not wasted on type checking. However, compiled languages have their own disadvantages, particularly for scripting.

The function type when applied to an identifier returns a string describing the type of its value.

   print(type(nil))               --> nil
   print(type(true))              --> boolean
   print(type(42))                --> number
   print(type("42"))              --> string
   print(type(type))              --> function
   print(type(lpeg.R("az","AZ"))) --> userdata
The possible types in Lua are:

      nil        boolean      number        string
      userdata   function     thread        table

which we will discuss later. In RiscLua the number type refers to 32-bit integers. The userdata type is a catch-all that the implementor can use to create what are in effect new types; RiscLua uses it for 64-bit floating point numbers (doubles) and for word-aligned arrays of four-byte words (blocks).

Note that in Lua, unlike Basic, functions are values. Functions in Lua are not tied to unique names, as they are in Basic. You can let an identifier stand for a function just as you can let it stand for a number or a string. We will meet plenty of occasions when it is convenient in Lua to use functions without giving them a name; that is not possible in Basic. Basic and C are called first order languages because they distinguish passive data (zeroth order values) like numbers or strings from active data, i.e. functions (first order values) which transform passive data to passive data. Lua, on the other hand, is a higher order language, because functions can transform functions, and have no privileged status. Composition of functions is an example of a higher order function. We can write it in Lua as the function:

  \(f,g) => \(x) => f(g(x)) end end

Higher order languages have been around since the earliest days of computing, but first order languages have always been more widespread. If you have not encountered higher order languages before you may well underestimate just how much more expressive they can be.
chunks

To describe a program as a list of instructions is too flat an oversimplification. A Lua program consists of a list of chunks. A chunk consists of statements and chunks - that is, chunks can be enclosed within chunks. This means that a program is not so much a list of instructions as a tree of instructions.

By contrast, Basic's structure is rather different - it consists of a main section which comes first and is terminated by the word END, followed by function/procedure definitions, initiated by the word DEF, which are not allowed to be nested.

scope diagram

The notion of chunk is absolutely fundamental to Lua. Because it is a notion that is not required in Basic, those who come to Lua with only a background in Basic may perhaps not appreciate initially just why it is so important. The reason is that it controls the scope of variables and lets you exercise the important discipline of information hiding, which is required for modularizing a program into re-usable self-contained units.

In Basic a variable used in a function/procedure definition can be declared LOCAL to it. Without this facility it would be impossible to write reusable function/procedure definitions. In Lua a variable can be declared local to a chunk. The body of a function is a chunk. But in Lua the body of any control structure ( while, repeat, for, if ) is also a chunk. So is code written in a separate file, that can be loaded into the program. You can make any collection of statements into a chunk by surrounding them with the words  do and end . There is a speed advantage to using local rather than global variables in Lua, and in Basic too. So the use of chunks in Lua is not just a stylistic matter; it is more efficient. In immediate mode each completed statement is treated as a separate chunk. This limits the usefulness of immediate mode, so we will pay more attention from now on to programs written in a file.

Quite a lot of chunks are terminated by the keyword  end . I find it convenient sometimes to append a comment describing what sort of chunk is being terminated.

 if condition  then     . . . .     end -- if 

 while  condition   do     . . . .     end -- while 

 for loopvars  in  iterator    do     . . . .     end -- for 

 \ ( parameters )     . . . .       end -- function 

Loop variables are automatically local to a for-chunk. Parameters are automatically local to a function-chunk. Both these kinds of chunk will be explained in more detail later.

lexical scoping

A global variable , one not declared local to a chunk, has global scope; that is, its value can be accessed anywhere in the program apart from where it is overridden by a local variable with the same name. The principle of lexical scoping , which Lua adheres to, says that the scope of a local variable extends from the statement after that in which it is declared local to the end of the innermost chunk in which the declaration occurs, including all the subchunks of that chunk which follow the declaration . In terms of our picture of the program as a tree, this means that the scope of a local variable is all of the tree that is to the right of and higher than its declaration. Of course, a local variable may be overridden by another local variable of the same name whose declaration occurs in its scope.

example: A counter factory

We often need to count things; words, linenumbers etc. To specify a counter, i.e. a function giving a new count value each time it is called, we need to give the start value, and the value by which the counter goes up each time. Here is a way to do it if the start value is 0 and the increment is 1:

  do  -- start the chunk
  local count = 0
  counter = \() -- function taking no arguments
            local n = count
            count = count + 1
            => n
            end -- function
  end -- end the chunk

The variable count , being local to the chunk, is not accessible outside it. But the function  counter is not local to the chunk, and so is accessible outside it. Each time it is called it produces a value that is greater by one than the previous time. So we have used the chunk to hide the  count  variable from the rest of the program. It does not matter if the same name is used elsewhere in the program; the interpreter will treat it separately without confusion.

If we wanted counter to be local to some enclosing chunk, rather than be a global variable, we would declare it local inside the enclosing chunk and before the chunk shown above. Of course, we would not stick the word local in front of its definition because that would give it the wrong scope.

A variable that appears in a chunk which is neither a global variable, nor declared as local within it (and so has been declared local in some enclosing chunk) is called an upvalue for the chunk. In the body of counter the variable n is local, and the variable count is an upvalue.

Alternatively we can define a counter factory as a function whose values are counters.

  make_counter = \ (start,increment)
                 local count = start
                  => \ ()
                      local n = count
                      count = count + increment
                      => n
                     end -- function
               end -- function
  odds = make_counter (1,2)
  by10 = make_counter (0,10)

Each of the two counters odds and by10 acquires its own private copy of the upvalue count . So we see that an upvalue is not to be thought of as a single value, which is stored in a single place. Each call to make_counter will bring into being a separate closure consisting, in this case, of an inaccessible count variable and a counter function to update it.

back contents forward