Tables

Tables are the only datatype in Lua. They combine many ideas that are separate, or do not exist, in other programming languages. For that reason a variety of other names are often used:

  • tables
  • look-up tables
  • arrays
  • lists
  • stacks
  • dictionaries
  • environments
  • objects
An empty table is denoted by { }. It simply consists of a pointer into the heap. Wherever you see a pair of braces in a Lua program it means that a new pointer into the heap has been allocated, and, if the table is not empty, data has been stored to which it is pointing.

Note that each use of braces creates a different pointer.

     local x = { }
     local y = { }
     print (x == y)  --> false
Each item in a nonempty table consists of two things: a non-nil index (also called a key) and a non-nil value. If t is a table then t [ i ] denotes the value of t at the index which is the value of i, or nil if no such index exists. One may think of a table as a kind of function of its indices. Indeed, there is a space-time duality lurking here: functions take up a fixed amount of storage but take a variable amount of time to return a value, whereas tables take up a variable amount of storage but a fixed amount of time to return a value. Functions are applied to arguments using round brackets, tables are applied to a single index using square brackets. Any non-nil value can be used as an index to a table. A statement

       t [ i ] = x
either updates t [ i ] to the value x, if x is non-nil, or deletes the index given by the value of i if x is nil, or creates the index given by the value of i and assigns to it the value of x. Table-values are the only updateable things in Lua. Strings are not updateable. There is nothing to stop a table being one of its own indices, or being one of its own values.

        t [ "I contain myself" ] = t
        t [ t ] = "Well why not?"
Because Lua is designed for embedding in other people's software it has to be secure. In particular, there can be no peeking and poking of arbitrary memory addresses, such as C allows with its pointer arithmetic. Instead one uses tables, which can express any kind of datastructure that one could cook up with structs and pointers in C. For example, to insert a new value x into a linked list a:

   local list_insert = \ (a, x)
         if not a.next then
            a.next = { val = x }
         else
            a = { next = a.next, val = x }
         end -- if
       end -- function
It should be explained that a.next is syntactic sugar for a [ "next" ] and that { val = x } is syntactic sugar for { [ "val" ] = x }. But in fact, translating C-like datastructures in this way often turns out to be less efficient than more direct usage of tables. The not keyword returns true if its argument is nil or the Boolean value false, and false otherwise.

It is clear that making the same calculation twice is a waste of computational effort if you have the storage capacity to keep track of the results. Memoization is the technique of associating to a function a table to keep track of its values and using that table instead of the function whenever possible.

local memoize = \ (f)
     local t = { }
     => \ (x)
        local a = t [x]  -- look up answer in t
        if a then => a end -- if
        a = f (x)     -- it easn't there so we have to calculate it
        t [x] = a     -- update t with new item
        => a
       end -- function
   end -- function
Memoization can give dramatic increases in speed for recursively defined functions.

local fib
fib  = \ (n)
        if n < 2 then => n end -- if
        => fib ( n - 1 ) + fib ( n - 2 )
       end -- function
This is a very inefficient algorithm. Replace fib by memoize (fib) and you will see a huge improvement in performance.

A table is called a list if its indices form a consecutive run of integers starting with 1 (not 0 - this is a contentious point for some people). Evidently every table can be said to have a list-like part, which is empty if if the table does not have 1 for an index. If t is a table the expression #t gives the size of its list-like part. The statement

  t [ 1 + #t ] = x 
pushes x to t's list-like part. The advantage of lists is that we have a natural ordering of their indices. The structure

    for i, x in ipairs (t) do
       ......
    end -- for
iterates over the list-like part of t, with i running over its indices in order and x having the value t [ i ]. By contrast the structure

    for i, x in pairs (t) do
       ......
    end -- for
iterates over all the keys i of t, but in no predeterminable order.

The notation

     { x, y, .... }
is short hand for

    { [1] = x, [2] = y, .... }

A set or bag is a table all of whose values are the Boolean value true. Thinking of tables as functions, this means that the table is the characteristic function of the set. Boolean operations on sets are very easy to define.

   meet = \ (A, B)
          local X = { }
          for a, _ in pairs (A) do
             X [ a ] = B [a]
          end -- for
          => X
          end -- function

An environment is essentially a table whose indices are all strings that satisfy the requirements of being a Lua variable name: they are composed of letters digits and underscores but do not start with a digit. For such tables Lua allows the dot notation we saw above.

       t.foo   <==>  t [ "foo" ]
When the Lua interpreter starts up it creates a table _G of the initial environment. Note that _G._G is just _G. Every chunk is equipped with a local variable _ENV. The global chunk's _ENV is initially set to _G. When the interpreter encounters a non-local variable it is looked up in _ENV, so that _ENV is taken as the current environment. However, we can set _ENV to whatever we want.

     local T = { }
     do
       local _ENV = T
       x = "hello"
     end
     print (x)   --> nil    x is not in the current environment
     print (T.x) --> hello  but it is in T
Note that we cannot use print inside the do ... end because it is not a local variable and is not in the environment T, which is empty. But this works

     local T = { }
     local print = print
     do
       local _ENV = T
       x = "hello"
       print (x)    -->  hello
     end
because the do ... end chunk is a subchunk of the chunk in which the local variable print has been declared. The print on the right hand side (the global variable) no longer exists within the do ... end chunk after its environment has been set to T.

We often want to get local variables out of tables and RiscLua has extra syntax to facilitate this.

  local x, y, ... in T   <==>  local t = T
                               local x, y, ...  = t.x, t.y, ...
so we could have written

    local print in _G
instead of

    local print = print
Local variables, being compiled simply to stack offsets, are much quicker to look up than non-locals which have to be searched for in an environment. So it is good practice to use local variables wherever possible.