expressions. Not all s-expressions are legal Lisp forms any more than all sequences of characters are legal s- expressions. For instance, both (foo 1 2)
and ('foo' 1 2)
are s-expressions, but only the former can be a Lisp form since a list that starts with a string has no meaning as a Lisp form.
This split of the black box has a couple of consequences. One is that you can use s-expressions, as you saw in Chapter 3, as an externalizable data format for data other than source code, using READ
to read it and PRINT
to print it.[39] The other consequence is that since the semantics of the language are defined in terms of trees of objects rather than strings of characters, it's easier to generate code within the language than it would be if you had to generate code as text. Generating code completely from scratch is only marginally easier—building up lists vs. building up strings is about the same amount of work. The real win, however, is that you can generate code by manipulating existing data. This is the basis for Lisp's macros, which I'll discuss in much more detail in future chapters. For now I'll focus on the two levels of syntax defined by Common Lisp: the syntax of s-expressions understood by the reader and the syntax of Lisp forms understood by the evaluator.
The basic elements of s-expressions are
And that's pretty much it. Since lists are syntactically so trivial, the only remaining syntactic rules you need to know are those governing the form of different kinds of atoms. In this section I'll describe the rules for the most commonly used kinds of atoms: numbers, strings, and names. After that, I'll cover how s-expressions composed of these elements can be evaluated as Lisp forms.
Numbers are fairly straightforward: any sequence of digits—possibly prefaced with a sign (+
or -
), containing a decimal point (.
) or a solidus (/
), or ending with an exponent marker—is read as a number. For example:
123 ; the integer one hundred twenty-three
3/7 ; the ratio three-sevenths
1.0 ; the floating-point number one in default precision
1.0e0 ; another way to write the same floating-point number
1.0d0 ; the floating-point number one in 'double' precision
1.0e-4 ; the floating-point equivalent to one-ten-thousandth
+42 ; the integer forty-two
-42 ; the integer negative forty-two
-1/4 ; the ratio negative one-quarter
-2/8 ; another way to write negative one-quarter
246/2 ; another way to write the integer one hundred twenty-three
These different forms represent different kinds of numbers: integers, ratios, and floating point. Lisp also supports complex numbers, which have their own notation and which I'll discuss in Chapter 10.
As some of these examples suggest, you can notate the same number in many ways. But regardless of how you write them, all rationals—integers and ratios—are represented internally in 'simplified' form. In other words, the objects that represent -2/8 or 246/2 aren't distinct from the objects that represent -1/4 and 123. Similarly, 1.0
and 1.0e0
are just different ways of writing the same number. On the other hand, 1.0
, 1.0d0
, and 1
can all denote different objects because the different floating-point representations and integers are different types. We'll save the details about the characteristics of different kinds of numbers for Chapter 10.
Strings literals, as you saw in the previous chapter, are enclosed in double quotes. Within a string a backslash () escapes the next character, causing it to be included in the string regardless of what it is. The only two characters that
'foo' ; the string containing the characters f, o, and o.
'foo' ; the same string
'fo\o' ; the string containing the characters f, o, , and o.
'fo'o' ; the string containing the characters f, o, ', and o.
Names used in Lisp programs, such as FORMAT
and hello- world
, and *db*
are represented by objects called
Two important characteristics of the way the reader translates names to symbol objects have to do with how it treats the case of letters in names and how it ensures that the same name is always read as the same symbol. While reading names, the reader converts all unescaped characters in a name to their uppercase equivalents. Thus, the reader will read foo
, Foo
, and FOO
as the same symbol: FOO
. However, foo
and |foo|
will both be read as foo
, which is a different object than the symbol FOO
. This is why when you define a function at the REPL and it prints the name of the function, it's been converted to uppercase. Standard style, these days, is to write code in all lowercase and let the reader change names to uppercase.[42]
To ensure that the same textual name is always read as the same symbol, the reader
Because names can contain many more characters in Lisp than they can in Algol-derived languages, certain naming conventions are distinct to Lisp, such as the use of hyphenated names like hello-world
. Another important convention is that global variables are given names that start and end with *
. Similarly, constants are given names starting and ending in +
. And some programmers will name particularly low-level functions with names that start with %
or even %%
. The names defined in the language standard use only the alphabetic characters (A-Z) plus *
, +
, -
, /
, 1
, 2
, <
, =
, >
, and &
.
The syntax for lists, numbers, strings, and symbols can describe a good percentage of Lisp programs. Other rules describe notations for literal vectors, individual characters, and arrays, which I'll cover when I talk about the associated data types in Chapters 10 and 11. For now the key thing to understand is how you can combine numbers, strings, and symbols with parentheses-delimited lists to build s-expressions representing arbitrary trees