This chapter gives a quick introduction to programming in the Scheme language. It will also mention Sizzle limitations and features where appropriate.
Scheme uses fully parenthesized prefix notation, that means that all procedure calls are enclosed in parentheses. The first element in a procedure call is the procedure which will be invoked and the other elements are the arguments which will be passed to the procedure. Note that the procedure will be evaluated in the same way as the rest parameters.
zzz: (+ 1 2) 3 zzz: ((lambda (x) x) 1) 1
All arguments in Scheme are passed by value, e.g evaluated before the
procedure is called. Special forms like if, define
etc. do not follow that rule, they evaluate their arguments only when
needed. There is no syntactic difference between a procedure call and
the application of a special form, so you need to know the names of all
special forms in order to tell how some given code will evaluate.
Variables are defined using the special form define and can be
modified lateron by using set!, which is another special form.
zzz: (define a 1) zzz: a 1 zzz: (set! a 2) zzz: a 2
Functions are either defined by defining a variable with the value of a
lambda expression or using a special form of define, which is
intended for defining procedures more easily.
zzz: (define identity (lambda (x) x)) zzz: (identity 2) 2 zzz: (exit) $ sizzle zzz: (define (identity x) x) zzz: (identity 3) 3
When you define a variable more than once in an environment, the former definition will be silently overwritten.
The Sizzle core contains a garbage collector, so that the programmer is not responsible to free objects which are not needed anymore. Internally, the garbage collector is implemented as a conservative collector which marks all objects as being used which are reachable through global variables (known to the collector) or are reachable by scanning the C stack.
At the moment, the garbage collector is a very simple, recursive mark-and-sweep collector, without generations or incremental behaviour. It has been sufficient for my needs, but maybe I will switch to a more sophisticated algorithm later when a need for it apears.
Sizzle has a lot of useful data types already built in, and provides a lot of procedures for manipulating values of various types. For detailed description of the available data manipulation functions please refer to the Programming Reference chapter in this manual.
The Scheme standard defines the so-called numeric tower, a hierarchy of numeric data types where every data type includes the values of the data types lower in the hierarchy and which is defined as follows:
number
complex
real
rational
integer
Furthermore, Scheme makes a difference between exact and inexact numbers. Exact numbers are always precise whereas inexact numbers are only an approximation. The same differentiation is made for operations on numbers. For example, the addition of two exact numbers always produces an exact result, but on the contrary, the division of exact numbers may be inexact.
The read syntax for numbers is as follows: the number may starts with an optional number of exactness and radix , followed by a number of digits which must fit into the radix scheme which may be given with a radix prefix. Then a decimal point and another sequence of numbers may follow, in decimal or scientific notation may follow. A list of the valid exactness and radix prefices is given in the following table.
#e
#i
#b
#o
#d
#x
Integers are exact numbers with the same range as the C long data
type.
Examples of integer constants:
42 -23 #xdead #o755 #b100001010 #d42
Floats are floating point values, corresponding to double values
in C. Real number constants can start with a radix and exactness prefix,
but only the radix prefix #d is allowed. When using the exactness
prefix #e (exact number), the number must have a fraction of
zero. Read syntax examples:
3.1459 -10.1e2 #i1.0 #e1.0
Complex and rational numbers are not supported by Sizzle. Maybe in the future they will. Also bignums (arbitrary precision numbers), as known from other Scheme systems, are not (yet) supported.
Booleans are the truth values true and false, in Scheme written like
this: #t means true and #f means false.
In Scheme, #f is the only value that is considered false in
conditionals, neither 0 nor the empty list '() counts as
false.
Characters are denotated in the following form: first, a hash mark (#), then a backslash and then the literal character. Examples:
#\j
#\
Scheme provides another notation for non-printable characters. A space
character can also be written like this #\space and a newline
like #\newline. Sizzle understands three additional symbolic names for characters. This table shows all of them:
#\space
#\newline
#\return
#\tab
#\bell
Strings are sequences of characters. String constants are enclosed in
double quotes and can contain all characters from the ASCII set. The
double quote character " and the backslash \ have to be
quoted with a preceding backslash. Some special non-printable characters can be included in string constants by denoting special backslash-character sequences. The following sequences are defined:
\n
\r
\t
\v
\b
\a
Strings can be mutable and immutable. Immutable are those which were
literally entered from the input source. They can not be used with
destructive procedures like string-set! or string-fill!.
Strings returned by functions like string-copy or string
are mutable and can be used with such strings.
Symbols are values with some special properties. They are written by simply writing their name, without any quotes or special markup characters. The main difference between symbols and strings is that symbols which have the same textual representation are in fact the same object. Additionally, symbols can have values associated to them, then they act as variables.
When reading symbols, the Sizzle reader considers all characters which do not explicitly end the symbol as part of the symbol. These ending characters are `(', `)', whitespace, `;' and `"'. It is even possible to include these characters into a symbol name by preceeding it with a backslash. Thus, these all are valid symbol names:
Hello i-am-a-symbol strange\ symbol
Of course, it is not too clever to use symbols with enclosed whitespace or parentheses in program source code. Their use can degrade readability of programs considerably.
Keywords are similar to symbols in that they are represented externally by a character sequence and that keywords with the same external representation are indeed the same objects internally. The difference between keywords and symbols is that symbols are normally bound to locations containing values and that these values are retrieved when a symbols is evaluated, whereas keywords evaluate to themselves. Thus it is not necessary to quote keywords on input.
Keywords are written like symbols, but are either prefixed with a colon
(:) or the sequence hashmark--colon (#:). When comparing,
only the keyword name is significant, the prefix is ignored.
zzz: #:keyword #:keyword zzz: :keyword #:keyword zzz: keyword error: unbound variable: keyword
In the above example you can see how keywords evaluate to themselves and
that evaluating an unknown symbol signals an error. Note also that
keywords are always printed in the #:--notation, regardless how
they were entered.
Cons cells are cells which can hold two other values. The first of the value is called the `car' of the cell and the second is called `cdr' (for historic reasons). In source files and on the read-eval-print prompt you can create cons cell by using the dotted pair syntax.
'(foo . bar) '(1 . 2)
Another way to create cons cells is using the constructor function cons:
zzz: (cons 1 2) (1 . 2)
Lists, the main data structure in Sizzle are also made out of cons cells, but they can be typed in a more convenient way than
(1 . (2 . (3 . '())))
by simply writing
(1 2 3)
Both result in the same value.
Vectors are similar to arrays in other programming languages. You can store a fixed count of values in a vector and can access them in constant time, as opposed to the time for accessing elements of a list, which is linear to the list length. The difference to ordinary arrays is that the elements of a vector can have different types.
Vectors are written like lists, but before the opening parentheses you have to put a hash mark.
#(1 2 3 4 5 6 7)
#("Hello" #\ "World" \#!)
#(#(1 2) #())
The last example shows that vectors may contain vectors and that vectors may be empty.
Vectors can be mutable and immutable. Immutable are those which were
literally entered from the input source. They can not be used with
destructive procedures like vector-set! or vector-fill.
Vectors returned by functions like make-vector or vector
are mutable and can be used with such strings.
Note: Currently, homogenous numeric vectors are only available if support for them has been explicitly enabled on configuration time for the package. See section Building Sizzle for how to enable the support if your installed version lacks it and you need it.
Normal Scheme vectors are heterogenous datatypes. You can store any value in them, just as you can with lists. Sizzle provides another kind of vectors, so-called homogenous numeric datatypes, as defined in SRFI-4. These vectors are special, because it is only possible to store numbers of a particular data type in there. There are 10 different types of those vectors, each for another numeric data type: signed 8-bit values, unsigned 8-bit values, signed 16-bit values, unsigned 16-bit values, signed 32-bit values, unsigned 32-bit values, signed 64-bit values, unsigned 64-bit values, 32-bit floating point values and 64-bit floating point values. Only values of their type can be stored into these vectors, so they provide additional type safety. They are also more space-efficient, because, for example, vectors of signed 8-bit quantities need only to reserve 8 bits for each elements, whereas normal vectors reserve at least 32 bit for eacht element, up to 160 bits.
Each of the data types has its own type prefix, which is used in all procedures dealing with that type. The following table defines the prefixes.
Homogenous numeric vectors have a read (and print) syntax similar to vectors, but between the hash mark and the opening parentheses the prefix for the vector type must be inserted:
#u8(1 2 3 4 5 6 7) #s16(-42 23 2 0) #f32(1.2 3.14159265)
Similar to normal vectors, literally denoted homogenous vectors are
read-only and can not be modified using primitive procedures like
u8vector-set!. Use procedures like make-f64vector to
create mutable vectors.
Hash tables (aka associative arrays) are very useful data structures, when a non-trivial programming task is at hand. Hash tables provide a mapping from one object to another and are similar to association lists, but they are significantly faster when they grow large.
In Sizzle, hash tables can be written literally with a #{
prefix, followed by key/value pairs separated by the arrow symbol
(=>), like this, terminated by a closing brace (}):
zzz: #{Joe => "cool" Jim => "lame"}
#{Jim => "lame" Joe => "cool"}
This is also the print syntax for hash tables.
Another possibility is to use the procedure make-hash which
constructs a hash table from its arguments. All arguments must be pairs
which will be taken as key/value pairs and inserted in the new hash
table.
zzz: (make-hash '(Joe . "cool") '(Jim . "lame"))
#{Jim => "lame" Joe => "cool"}
Note that the order of the key/value pairs depends on the size of the
data structure internally used to store the values and may differ for
the same input, when this size is not the same. The size is either
chosen internally, depending on the number of pairs, or can be provided
as an optional first argument to make-hash:
zzz: (make-hash '(Joe . "cool") '(Jim . "lame"))
#{Jim => "lame" Joe => "cool"}
zzz: (make-hash 32 '(Joe . "cool") '(Jim . "lame"))
#{Joe => "cool" Jim => "lame"}
The best choice for the size argument is a prime number not too near to a power of 2 (so says Knuth).
There are three kinds of procedures in Sizzle: Builtin procedures, builtin syntactic forms and user-defined procedures (lambda expressions, or closures).
The difference between builtin functions and syntactic forms is that funtions evaluate their arguments before theu are called and arguments to syntactic forms are only evaluated when needed.
Builtin functions and syntactic forms do not have any read syntax (that would not make sense) and they print in the following way:
zzz: set! #<macro set!> zzz: append #<primitive-procedure append>
Closures print as a pair of the lambda expression they stand for and the address of the environment the expression is closed over.
zzz: (define (f n) n) zzz: f #<closure #<procedure f (n)> 0x401af072>
Regular expressions are created using the primitive procedure
make-regexp. The resulting regular expression object can then be
used in calls to regexp-exec in order to match the regular
expression against strings. The format of the regular expression passed to make-regexp is the same as for the Posix regular expressions. For more information, type
man regex
Regular expressions are not part of any Scheme standard, but the procedures implemented in Sizzle are compatible to those used by Guile.
zzz: (define rx (make-regexp "foo|bar"))
zzz: (regexp-exec rx "a-foo-baz")
#("a-foo" (2 . 5))
zzz: (regexp-exec rx "bar-o-matic")
#("bar" (0 . 3))
Due to limitations in the interface of Posix regular expressions, regular expressions in Sizzle may not contain null bytes. Pattern strings are truncated at the first null byte on compilation.
In Scheme, it is possible to return more than one value from a
procedure. This is done by using the primitive procedure values
as the last expression in your procedure. Handling those values when
returned from a procedure is a bit strange, since the only standard form
defined for that purpose is call-with-values. The following
examples may (hopefully) clear up the whole mess.
zzz: (values 1 2 3) 1 2 3 zzz: (call-with-values (lambda () (values 1 2)) (lambda (x y) (+ x y))) 3
When multiple values are returned to the read-eval-print loop, they are
printed one after the other, every value on a new line. The second
example shows the usage of call-with-values, where the first
lambda expression is evaluated (and returns multiple values) and the
values returned from the first procedure are passed to the second
procedure as normal arguments.
Some values which appear in the interpreter are different from all
others and cannot be assigned to one of the data types. These values
are referred to as Special Values in this manual. Examples of
these special values are the end-of-file object which indicates
an end-of-file condition, or the unspecified object used inside
of the interpreter.
Sizzle supports a weird data type for constants. They can represent any
Scheme object, but have the additional property that references to
constants are replaced by their value when they are evaluated in a
procedure body. The result is faster execution, because the necessary
lookup step for variables is eliminated. Constants are created with the
syntactic form define-constant and can be used by variable, once
they are defined.
Notice that due to the substitution feature, it is not possible to redefine a constant. But would it be a constant then?
All input and output in Scheme is done using ports, which can be
regarded as data sources and sinks. Ports are characterized by the fact
that they provide procedures for reading and writing data, telling and
setting file pointer positions (if they are built on top of files) and
various others. Ports are either created explicitly by calls to
procedures like open-input-file, or implicitely when using
with-output-to-string and friends. They have a print syntax, but
no read syntax.
zzz: (current-input-port) #<rlport:open=1:read=1:write=0> zzz: (current-output-port) #<fdport:open=1:read=0:write=1:fd=1>
The first port in this example is a readline port, which supports command line editing, parentheses matching etc., the second is a normal file descriptor port connected to the standard output file descriptor.
Note that readline ports are only available if you have installed the
`sizzleopt' package and used the readline module with
(use-modules (readline readlie)).
At the moment, Sizzle supports so-called fdports, which use an underlying file descriptor, fports which use an underlying `stdio' file structure, string ports which use an internal character buffer for input and output, and rlports (readline ports) which read from the terminal via the `readline' library. The latter provide fancy command line editing.
Errors and Exceptions are distinct data types in Sizzle. Whenever an
error occurs (such as when you pass the wrong number of arguments to a
procedure), an error object is created and then propagated to the top
level, which may be either the read-eval-print loop or a procedure
evaluating a file. There the error is displayed, using the information
in the error object. In an error object, an error message is stored,
and if the error comes from the reader module, a file name and a line
and column number is also included. Error objects can not explicitly
created, but they are created whenever you use the error
procedure to signal an error.
Exceptions are a bit different. When raised, they also create objects
containing an error message, but they also contain a so-called
exception tag, which may be any atom. Exceptions can be raised from
the Scheme code by calling the procedure throw. Using the
exception tag for a particular exception or the catch-all exception tag
#t, the user can catch exceptions with the procedure
catch.
The rule of thumb is that exceptions are raised when an error situation
allows continued evaluation, and errors are signalled on errors which
are too serious to continue. An examples for exceptions is the
file-not-found exception, which is raised when a non-existent
file is opened; a quite common situation. Errors are signalled on
syntax errors, invalid procedure application or typing errors.
Right now, the difference between exceptions and errors in the Sizzle core is a bit half-baked. Not all continuable error situations throw exceptions, a lot of them unnecessarily signal errors. This will be changed in the future. Currently, support for errors is slowly phased out and all error--signalling places in the code are changed to throw various types of exceptions instead.
Sizzle only supports continuations usable as escape procedures. That means that one cannot export a continuation from the defining context by returning it from a procedure or storing it into a global variable, and then try to invoke it.
Go to the first, previous, next, last section, table of contents.