Go to the first, previous, next, last section, table of contents.


Programming Scheme

This chapter gives a quick introduction to programming in the Scheme language. It will also mention Sizzle limitations and features where appropriate.

General

Scheme uses fully parenthesized prefix notation, that means that all procedure calls are enclosed in parentheses. The first element in a procedure call is the procedure which will be invoked and the other elements are the arguments which will be passed to the procedure. Note that the procedure will be evaluated in the same way as the rest parameters.

zzz: (+ 1 2)
3
zzz: ((lambda (x) x) 1)
1

All arguments in Scheme are passed by value, e.g evaluated before the procedure is called. Special forms like if, define etc. do not follow that rule, they evaluate their arguments only when needed. There is no syntactic difference between a procedure call and the application of a special form, so you need to know the names of all special forms in order to tell how some given code will evaluate.

Variables are defined using the special form define and can be modified lateron by using set!, which is another special form.

zzz: (define a 1)
zzz: a
1
zzz: (set! a 2)
zzz: a
2

Functions are either defined by defining a variable with the value of a lambda expression or using a special form of define, which is intended for defining procedures more easily.

zzz: (define identity (lambda (x) x))
zzz: (identity 2)
2
zzz: (exit)
$ sizzle
zzz: (define (identity x) x)
zzz: (identity 3)
3

When you define a variable more than once in an environment, the former definition will be silently overwritten.

Storage Model

The Sizzle core contains a garbage collector, so that the programmer is not responsible to free objects which are not needed anymore. Internally, the garbage collector is implemented as a conservative collector which marks all objects as being used which are reachable through global variables (known to the collector) or are reachable by scanning the C stack.

At the moment, the garbage collector is a very simple, recursive mark-and-sweep collector, without generations or incremental behaviour. It has been sufficient for my needs, but maybe I will switch to a more sophisticated algorithm later when a need for it apears.

Data Types

Sizzle has a lot of useful data types already built in, and provides a lot of procedures for manipulating values of various types. For detailed description of the available data manipulation functions please refer to the Programming Reference chapter in this manual.

Numbers

The Scheme standard defines the so-called numeric tower, a hierarchy of numeric data types where every data type includes the values of the data types lower in the hierarchy and which is defined as follows:

number
All numbers belong to the data type number.
complex
Complex numbers consist of a real and an imaginary part.
real
Real numbers are all rational and irrational numbers.
rational
Rational numbers consist of a numerator and a denominator.
integer
Integers are all numbers without a fraction component.

Furthermore, Scheme makes a difference between exact and inexact numbers. Exact numbers are always precise whereas inexact numbers are only an approximation. The same differentiation is made for operations on numbers. For example, the addition of two exact numbers always produces an exact result, but on the contrary, the division of exact numbers may be inexact.

The read syntax for numbers is as follows: the number may starts with an optional number of exactness and radix , followed by a number of digits which must fit into the radix scheme which may be given with a radix prefix. Then a decimal point and another sequence of numbers may follow, in decimal or scientific notation may follow. A list of the valid exactness and radix prefices is given in the following table.

#e
Number is exact
#i
Number is inexact
#b
Binary number, radix 2
#o
Octal number, radix 8
#d
Decimal number, radix 10
#x
Hexadecimal number, radix 16

Integers are exact numbers with the same range as the C long data type.

Examples of integer constants:

42
-23
#xdead
#o755
#b100001010
#d42

Floats are floating point values, corresponding to double values in C. Real number constants can start with a radix and exactness prefix, but only the radix prefix #d is allowed. When using the exactness prefix #e (exact number), the number must have a fraction of zero. Read syntax examples:

3.1459
-10.1e2
#i1.0
#e1.0

Complex and rational numbers are not supported by Sizzle. Maybe in the future they will. Also bignums (arbitrary precision numbers), as known from other Scheme systems, are not (yet) supported.

Booleans

Booleans are the truth values true and false, in Scheme written like this: #t means true and #f means false.

In Scheme, #f is the only value that is considered false in conditionals, neither 0 nor the empty list '() counts as false.

Characters

Characters are denotated in the following form: first, a hash mark (#), then a backslash and then the literal character. Examples:

#\j
The character `j'
#\
The space character

Scheme provides another notation for non-printable characters. A space character can also be written like this #\space and a newline like #\newline. Sizzle understands three additional symbolic names for characters. This table shows all of them:

#\space
The space character
#\newline
The newline character
#\return
The carriage return character
#\tab
The tab character.
#\bell
The bell character.

Strings

Strings are sequences of characters. String constants are enclosed in double quotes and can contain all characters from the ASCII set. The double quote character " and the backslash \ have to be quoted with a preceding backslash. Some special non-printable characters can be included in string constants by denoting special backslash-character sequences. The following sequences are defined:

\n
Newline
\r
Carriage return
\t
Tab character
\v
Vertical tab
\b
Backspace
\a
Bell character

Strings can be mutable and immutable. Immutable are those which were literally entered from the input source. They can not be used with destructive procedures like string-set! or string-fill!. Strings returned by functions like string-copy or string are mutable and can be used with such strings.

Symbols

Symbols are values with some special properties. They are written by simply writing their name, without any quotes or special markup characters. The main difference between symbols and strings is that symbols which have the same textual representation are in fact the same object. Additionally, symbols can have values associated to them, then they act as variables.

When reading symbols, the Sizzle reader considers all characters which do not explicitly end the symbol as part of the symbol. These ending characters are `(', `)', whitespace, `;' and `"'. It is even possible to include these characters into a symbol name by preceeding it with a backslash. Thus, these all are valid symbol names:

Hello
i-am-a-symbol
strange\ symbol

Of course, it is not too clever to use symbols with enclosed whitespace or parentheses in program source code. Their use can degrade readability of programs considerably.

Keywords

Keywords are similar to symbols in that they are represented externally by a character sequence and that keywords with the same external representation are indeed the same objects internally. The difference between keywords and symbols is that symbols are normally bound to locations containing values and that these values are retrieved when a symbols is evaluated, whereas keywords evaluate to themselves. Thus it is not necessary to quote keywords on input.

Keywords are written like symbols, but are either prefixed with a colon (:) or the sequence hashmark--colon (#:). When comparing, only the keyword name is significant, the prefix is ignored.

zzz: #:keyword
#:keyword
zzz: :keyword
#:keyword
zzz: keyword
error: unbound variable: keyword

In the above example you can see how keywords evaluate to themselves and that evaluating an unknown symbol signals an error. Note also that keywords are always printed in the #:--notation, regardless how they were entered.

Cons Cells

Cons cells are cells which can hold two other values. The first of the value is called the `car' of the cell and the second is called `cdr' (for historic reasons). In source files and on the read-eval-print prompt you can create cons cell by using the dotted pair syntax.

'(foo . bar)
'(1 . 2)

Another way to create cons cells is using the constructor function cons:

zzz: (cons 1 2)
(1 . 2)

Lists, the main data structure in Sizzle are also made out of cons cells, but they can be typed in a more convenient way than

(1 . (2 . (3 . '())))

by simply writing

(1 2 3)

Both result in the same value.

Vectors

Vectors are similar to arrays in other programming languages. You can store a fixed count of values in a vector and can access them in constant time, as opposed to the time for accessing elements of a list, which is linear to the list length. The difference to ordinary arrays is that the elements of a vector can have different types.

Vectors are written like lists, but before the opening parentheses you have to put a hash mark.

#(1 2 3 4 5 6 7)
#("Hello" #\  "World" \#!)
#(#(1 2) #())

The last example shows that vectors may contain vectors and that vectors may be empty.

Vectors can be mutable and immutable. Immutable are those which were literally entered from the input source. They can not be used with destructive procedures like vector-set! or vector-fill. Vectors returned by functions like make-vector or vector are mutable and can be used with such strings.

Homogenous numeric vectors

Note: Currently, homogenous numeric vectors are only available if support for them has been explicitly enabled on configuration time for the package. See section Building Sizzle for how to enable the support if your installed version lacks it and you need it.

Normal Scheme vectors are heterogenous datatypes. You can store any value in them, just as you can with lists. Sizzle provides another kind of vectors, so-called homogenous numeric datatypes, as defined in SRFI-4. These vectors are special, because it is only possible to store numbers of a particular data type in there. There are 10 different types of those vectors, each for another numeric data type: signed 8-bit values, unsigned 8-bit values, signed 16-bit values, unsigned 16-bit values, signed 32-bit values, unsigned 32-bit values, signed 64-bit values, unsigned 64-bit values, 32-bit floating point values and 64-bit floating point values. Only values of their type can be stored into these vectors, so they provide additional type safety. They are also more space-efficient, because, for example, vectors of signed 8-bit quantities need only to reserve 8 bits for each elements, whereas normal vectors reserve at least 32 bit for eacht element, up to 160 bits.

Each of the data types has its own type prefix, which is used in all procedures dealing with that type. The following table defines the prefixes.

s8
signed 8-bit values
u8
unsigned 8-bit values
s16
signed 16-bit values
u16
unsigned 16-bit values
s32
signed 32-bit values
u32
unsigned 32-bit values
s64
signed 64-bit values
u64
unsigned 64-bit values
f32
32-bit floating point values
f64
64-bit floating point values

Homogenous numeric vectors have a read (and print) syntax similar to vectors, but between the hash mark and the opening parentheses the prefix for the vector type must be inserted:

#u8(1 2 3 4 5 6 7)
#s16(-42 23 2 0)
#f32(1.2 3.14159265)

Similar to normal vectors, literally denoted homogenous vectors are read-only and can not be modified using primitive procedures like u8vector-set!. Use procedures like make-f64vector to create mutable vectors.

Hash tables

Hash tables (aka associative arrays) are very useful data structures, when a non-trivial programming task is at hand. Hash tables provide a mapping from one object to another and are similar to association lists, but they are significantly faster when they grow large.

In Sizzle, hash tables can be written literally with a #{ prefix, followed by key/value pairs separated by the arrow symbol (=>), like this, terminated by a closing brace (}):

zzz: #{Joe => "cool" Jim => "lame"}
#{Jim => "lame" Joe => "cool"}

This is also the print syntax for hash tables.

Another possibility is to use the procedure make-hash which constructs a hash table from its arguments. All arguments must be pairs which will be taken as key/value pairs and inserted in the new hash table.

zzz: (make-hash '(Joe . "cool") '(Jim . "lame"))
#{Jim => "lame" Joe => "cool"}

Note that the order of the key/value pairs depends on the size of the data structure internally used to store the values and may differ for the same input, when this size is not the same. The size is either chosen internally, depending on the number of pairs, or can be provided as an optional first argument to make-hash:

zzz: (make-hash '(Joe . "cool") '(Jim . "lame"))
#{Jim => "lame" Joe => "cool"}
zzz: (make-hash 32 '(Joe . "cool") '(Jim . "lame"))
#{Joe => "cool" Jim => "lame"}

The best choice for the size argument is a prime number not too near to a power of 2 (so says Knuth).

Procedures

There are three kinds of procedures in Sizzle: Builtin procedures, builtin syntactic forms and user-defined procedures (lambda expressions, or closures).

The difference between builtin functions and syntactic forms is that funtions evaluate their arguments before theu are called and arguments to syntactic forms are only evaluated when needed.

Builtin functions and syntactic forms do not have any read syntax (that would not make sense) and they print in the following way:

zzz: set!
#<macro set!>
zzz: append
#<primitive-procedure append>

Closures print as a pair of the lambda expression they stand for and the address of the environment the expression is closed over.

zzz: (define (f n) n)
zzz: f
#<closure #<procedure f (n)> 0x401af072>

Regular Expressions

Regular expressions are created using the primitive procedure make-regexp. The resulting regular expression object can then be used in calls to regexp-exec in order to match the regular expression against strings. The format of the regular expression passed to make-regexp is the same as for the Posix regular expressions. For more information, type

man regex

Regular expressions are not part of any Scheme standard, but the procedures implemented in Sizzle are compatible to those used by Guile.

zzz: (define rx (make-regexp "foo|bar"))
zzz: (regexp-exec rx "a-foo-baz")
#("a-foo" (2 . 5))
zzz: (regexp-exec rx "bar-o-matic")
#("bar" (0 . 3))

Due to limitations in the interface of Posix regular expressions, regular expressions in Sizzle may not contain null bytes. Pattern strings are truncated at the first null byte on compilation.

Multiple Values

In Scheme, it is possible to return more than one value from a procedure. This is done by using the primitive procedure values as the last expression in your procedure. Handling those values when returned from a procedure is a bit strange, since the only standard form defined for that purpose is call-with-values. The following examples may (hopefully) clear up the whole mess.

zzz: (values 1 2 3)
1
2
3
zzz: (call-with-values (lambda () (values 1 2)) (lambda (x y) (+ x y)))
3

When multiple values are returned to the read-eval-print loop, they are printed one after the other, every value on a new line. The second example shows the usage of call-with-values, where the first lambda expression is evaluated (and returns multiple values) and the values returned from the first procedure are passed to the second procedure as normal arguments.

Special values

Some values which appear in the interpreter are different from all others and cannot be assigned to one of the data types. These values are referred to as Special Values in this manual. Examples of these special values are the end-of-file object which indicates an end-of-file condition, or the unspecified object used inside of the interpreter.

Constants

Sizzle supports a weird data type for constants. They can represent any Scheme object, but have the additional property that references to constants are replaced by their value when they are evaluated in a procedure body. The result is faster execution, because the necessary lookup step for variables is eliminated. Constants are created with the syntactic form define-constant and can be used by variable, once they are defined.

Notice that due to the substitution feature, it is not possible to redefine a constant. But would it be a constant then?

Ports

All input and output in Scheme is done using ports, which can be regarded as data sources and sinks. Ports are characterized by the fact that they provide procedures for reading and writing data, telling and setting file pointer positions (if they are built on top of files) and various others. Ports are either created explicitly by calls to procedures like open-input-file, or implicitely when using with-output-to-string and friends. They have a print syntax, but no read syntax.

zzz: (current-input-port)  
#<rlport:open=1:read=1:write=0>
zzz: (current-output-port) 
#<fdport:open=1:read=0:write=1:fd=1>

The first port in this example is a readline port, which supports command line editing, parentheses matching etc., the second is a normal file descriptor port connected to the standard output file descriptor.

Note that readline ports are only available if you have installed the `sizzleopt' package and used the readline module with (use-modules (readline readlie)).

At the moment, Sizzle supports so-called fdports, which use an underlying file descriptor, fports which use an underlying `stdio' file structure, string ports which use an internal character buffer for input and output, and rlports (readline ports) which read from the terminal via the `readline' library. The latter provide fancy command line editing.

Errors and Exceptions

Errors and Exceptions are distinct data types in Sizzle. Whenever an error occurs (such as when you pass the wrong number of arguments to a procedure), an error object is created and then propagated to the top level, which may be either the read-eval-print loop or a procedure evaluating a file. There the error is displayed, using the information in the error object. In an error object, an error message is stored, and if the error comes from the reader module, a file name and a line and column number is also included. Error objects can not explicitly created, but they are created whenever you use the error procedure to signal an error.

Exceptions are a bit different. When raised, they also create objects containing an error message, but they also contain a so-called exception tag, which may be any atom. Exceptions can be raised from the Scheme code by calling the procedure throw. Using the exception tag for a particular exception or the catch-all exception tag #t, the user can catch exceptions with the procedure catch.

The rule of thumb is that exceptions are raised when an error situation allows continued evaluation, and errors are signalled on errors which are too serious to continue. An examples for exceptions is the file-not-found exception, which is raised when a non-existent file is opened; a quite common situation. Errors are signalled on syntax errors, invalid procedure application or typing errors.

Right now, the difference between exceptions and errors in the Sizzle core is a bit half-baked. Not all continuable error situations throw exceptions, a lot of them unnecessarily signal errors. This will be changed in the future. Currently, support for errors is slowly phased out and all error--signalling places in the code are changed to throw various types of exceptions instead.

Continuations

Sizzle only supports continuations usable as escape procedures. That means that one cannot export a continuation from the defining context by returning it from a procedure or storing it into a global variable, and then try to invoke it.


Go to the first, previous, next, last section, table of contents.