Welcome to the SOF Programming Language documentation!

This documentation details the Stack with Objects and Functions programming language, an experimental stack-based reverse-polish-notation functional programming language created by kleines Filmröllchen.

Source code

While the README is comprehensive on basic concepts and a good starting point for interested people (like you), the docs shall provide the most thorough information on SOF, including a full language description/specification and a from-scratch tutorial (no programming knowledge required).

This documentation, like all of SOF, is a Work In Progress (WIP). As much as possible, non-implemented features are marked as such. It is licensed under GNU FDL 1.3.

Structure

The documentation is mainly split into three parts:

The Guide provides a tutorial and a guide-like introduction to SOF. Read this if you want to start using the language or are just interested in a high-level explanation targeted at programmers.
The Library documentation provides API documentation for all modules in the standard library. Read this if you want to use standard library functionality.
The Reference provides a specification for the SOF language. Read this if you’re interested in more details, or want to write an implementation.

License

Copyright (C) 2019-2025 kleines Filmröllchen. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

SOF Language tutorial and guide

This section is a tutorial and a more user-focused guide of SOF's features. It's recommended that you have some experience programming in common paradigms.

Basics

SOF is an interpreted language. Use the installation steps described in the Readme to install SOF and launch the REPL interpreter. Note that the results of programs you typed in might not be visible; they are probably placed on the stack. This is different to a lot of REPL interpreters (such as Python or JavaShell), which print the result of the last expression, whether you put in a println or not. I recommend typing along with this tutorial and modifying the examples in your own creative ways to learn more about things are done in SOF. All example code shows the input and output of the interpreter, where >>> is a user input line, ... is a input continuation line and !!! is an error information starting line.

Let's get the Hello World out of the way:

>>> "Hello, world!" writeln
Hello, world!

This introduces both string literals (double quotes, escape sequences coming soon!) as well as the most basic I/O command: writeln, which takes a string and prints it to standard output, ending the line. Most languages call this command println, a leftover term from when such a command would actually instruct a printer to print the text on paper. But I'm getting off track here.

You may have noticed something weird here. Don't believe me? Let's try some arithmetic:

>>> 3 12 + writeln
15
>>> 26 18 * writeln
468
>>> 378 9 / writeln
42
>>> 1 2 + 3 + writeln
6

You can see that the first line clearly computes 3 + 12, but why is the operator (+, addition) after the numbers it operates on (3 and 12)? The reason is that the stack-based nature of SOF causes it to have postscript notation: all the operations come after the operands they operate on. Most famously, the document description language PostScript by Adobe uses this operation method, and - surprise, surprise - it works off a stack as well.

Each operation or function will take a different number of arguments that come before it. As we saw, writeln only takes one argument: the thing to be printed, while + takes two arguments: the two numbers to be added. The same goes for the other arithmetic operations, including - not shown here.

With this new knowledge, take a look at the last line. Where is the second operand to the second + instruction? The 3 is one of them, but the other seems to be missing. Except, of course, it doesn't.

Any operation that occurs in SOF ever only has the Stack to work with. This means that not only do literals place their values onto the stack, but operations place their result back onto the stack. When the interpreter sees the first +, it retrieves the top two elements on the stack, which in this case happen to be the numbers 1 and 2 put there by the literals, and computes their sum. The result, 3, is placed back onto the stack, which can be imagined as shortening the code to 3 3 +. The second + doesn't even know where its numbers come from - they may be from a user input function, they may be literals, they may be the result of an operation or they may be a duplication of another value (which, in this case, is quite possible). As long as they are both numbers, the + sums them and places that value onto the stack, ready for use in the next computation (which happens to be writeln).

The Stack

You can't understand SOF without understanding the stack. On the flip side, once you do, SOF should feel much more intuitive.

What is a stack?

This section will shortly explain the notion of a stack as used in computer science. Feel free to skip it if you know about stacks and the LIFO principle.

A stack, in computer science, is just like a stack in the real world. Imagine a stack of books:

----------
|  book  |
----------
 |  book  |
 ----------
|  book  |
----------
|  book  |
----------

These books are heavy - you cannot lift more than one at a time. And because they are placed on top of each other, you can only access the topmost one. For these reasons, you can only do one of two basic things: put another book onto the stack or remove one book from the stack, making the book below that the new topmost book. (Technically, you could also count looking at the topmost book as a basic operation). These operations are called push and pop (and peek), respectively.

The same thing goes for stacks in computer science, but now the books are stored in electronic memory and the books are data: in the case of SOF numbers, strings, commands, code etc. You will often see a stack being referred to as a LIFO queue, which stands for "last in - first out", i.e. the last element that went into the "queue" (on top of the stack) is the first element that will be retrieved back with a pop operation.

Advanced stacking

Here are some cool things to do with a stack:

>>> "world" dup writeln writeln
world
world

The dup operator, short for duplicate, creates an identical copy of anything on the stack.

>>> 6 7 8 pop writeln
7

The pop operator does exactly what it says: it discards a value from the stack. In this case, this makes the 7 the topmost element of the stack, which is then printed.

>>> 4 5 swap writeln
4

The swap operator swaps the topmost two elements of the stack.

Names and Errors

Let's get into more advanced topics. Variables, branching and of course the programmer's favorite: Errors.

Naming things

Right now, we are only using the stack for storing things. We can duplicate, remove and operate on these values, but if we need lower values that were put on the stack previously, we are going to run into some issues quickly. For this reason, we can use the mighty def operator:

>>> 3 x def
>>> # do some stuff here
>>> x . writeln
3

def is short for "define". In this case it defines that the name x should represent the number value 3. Nothing fancy happens, but in the background, SOF created what is essentially a variable of name "x" and given it the value of the number 3. We can now go off and do something else (also note the use of basic # comments) and come back later to retrieve x's value. This is done by giving the "variable"'s name followed by a . . That little dot, the Call operator, is incredibly powerful (and powers, like, everything in SOF at all), but let's not get ahead of ourselves. Here, it is just used to retrieve the value associated with a name. Also, from now on, we will use the proper SOF terminology and call this simple x an Identifier. It doesn't do anything on its own, it is just a piece of data that can be used to identify (hence the name) variables and other named things.

Making decisions

You are probably already waiting for conditional execution and all that turing-complete stuff. Here it is:

>>> { "That number is small" writeln } 4 1000 < { "That number is large" writeln } ifelse         
That number is small
>>> { "That number is small" writeln } 2000 1000 < { "That number is large" writeln } ifelse 
That number is large

There are a bunch of things here that need discussion. The braces delimit code blocks, which are a way of grouping instructions (in this case, two simple output writes) to be executed later. More on them in a bit. The < is a basic comparison operator, a less than operator that needs two numbers and returns a boolean (true/false) according to the comparison result. Finally, the ifelse is an operation that takes in two executable things, in this case the code blocks, and executes them based on the boolean result that lies in between them: if it is true (if the number is smaller than 1000), it executes the first block, if it is false (greater 1000), it executes the second block. A simple if would omit the alternative block and just execute the first block if the condition was true:

>>> { "Primary school completed!" writeln } 2 1 > if
Primary school completed!
>>> { "Primary school completed!" writeln } 2 4 > if
>>>

Errors

Up until now, we have only written simple programs that do not crash, because they conform to SOF's syntax and other rules. But let's say you screwed up while typing a string and forgot the closing quote:

>>> "a string
... 
!!! Syntax Error in line 1 at index 1:
 "a string
  ^
    No closing '"' for string literal.

The first thing that will happen when you try this is that a line with three dots will appear. This is because SOF found the error, but many errors can be corrected on the next line, so it gives you another chance. This error, however, is not resolvable: Ending the continuation line with another press of the enter key will make SOF scream at you. But in a good way.

First, there is the !!! bit, which signals an error. Then, the name of the Error is given. The section about errors in the reference has information on all errors, but the most common ones are Syntax, your code is sh*t, and Type, your data is sh*t. After the error type comes the information on where the error occurred (possibly incorrect) as well as the segment of code where the error occurred combined with a pointer to the exact character (mostly correct, no guarantees). This helps you find the place of mishap. Finally, there is some additional information on what went wrong: in this case, no closing quote for the string literal was found, which is exactly the error. Note that as with every language, the false behavior might be somewhere else, but wasn't detected due to legal SOF behavior. For example, you might be passing a wrong parameter to a function, which will only be detected when some operation tries to act on that parameter and finds it to be of a wrong type.

It's a bird! It's a plane! No wait - It's a function!

Functions are essentially code blocks on steroids. They can't do anything more or less, they are just safer and more convenient. To create a function, this pattern is generally used:

# create the function
{ 3 + return } 1 function addThree globaldef
# and call it
1 addThree : writeln # 4

The important bit is the primitive token (keyword) function. It takes in a piece of executable SOF code, a "Callable", as well as an argument count, which is 1 in this case. The code for the function here is a simple code block, the only way you can specify a Callable literally. Code block data behaves like any other data, it lives on the stack and can be assigned a name. Its superpower is delayed execution, that is, the SOF instructions you place inside it are not executed immediately, they are stored, to be run later. In this case, unlike the if and ifelse commands you saw earlier, we don't even use the stored code directly, instead, we use it as the logic of a newly created function. But the function itself is also just a Callable, some data, now sitting on the stack. To use it repeatedly from anywhere, we must def it like any other variable and give it a name. We use the globaldef in this case, which will always def globally (duh) (Python programmers: compare this to the global keyword). This pattern is SOF convention and allows you to define functions globally even inside other functions and classes.

We can now call the function, like we called the variables beforehand. But behold! This time around, we don't simply retrieve the Identifier's "value" with .. That would be the function itself again, which we want to call! For this reason, the double-call operator : exists. It simply executes two calls, the first time retrieving the function, the second time executing it. It is both more performant and more compact than . ., but identical in function.

Now, what happens when we call the function? If you know any programming language, you know that functions (or methods, procedures, lambdas or whatever they're called) can recieve one or more arguments. SOF, of course, is no different. If you know hardware/assembly-level programming, this may sound familiar: Arguments to an SOF function are expected to be on the stack. The function definition specifies the number of arguments, and SOF places a "stack protector" under all of a function's arguments. This means that a function's body cannot change anything that is on the stack, except for its arguments, of course. Also, anything that is on the stack when the function exits will be deleted, up to this "stack protector". Think of the stack protector as a specially-marked return address in assembler. The function cannot mess with the caller's stack and cannot clobber it. You can absolutely be sure how many elements from the stack are consumed, according to the function's argument count.

The code block looks familiar, but it now uses the new-fangled operation return. This will break out of the function body instantly and return the topmost value on the stack from the function. Alternatively, you can use the return:nothing PT, which will end the function without returning anything. This is the default behavior when the function's end is reached without a return. The return value is placed onto the stack, obviously. In the example we use it as an argument to writeln.

How to write readable SOF code

This page will outline the conventions, idioms and common practices used when writing SOF code. Following this guide leads to good, idiomatic SOF code and APIs that other developers can use with ease. Most sections are in no particular order.

Whitespace use

As SOF is very whitespace insensitive, good code uses whitespace to logically structure the otherwise pretty one-dimensional series of tokens.

Token separation

Tokens on one line are separated with one single space. An exception is made when you want to align groups of tokens in multiple sequential lines: then, use of multiple spaces between tokens for alignment is encouraged.

Example: Defining a number of variables in sequence: Don't do this:

3 x def
"string" msg def
2 4 * eight def

Do this:

3        x     def
"string" msg   def
2 4 *    eight def

Line breaks and token grouping

Each line should contain one single action, or a logical unit of actions, which might require multiple tokens. In general, PTs should appear on the same line as their arguments, except when these arguments are already on the stack or require a lot of steps to be prepared.

For PTs that take code blocks, such as if, the last closing brace and the PT itself should be on the same line, except when line length would be a problem.

Code blocks should be split up into lines, where the opening and closing brace are on their own line, mimicking the "braces on next line" code style found in C-like languages. Exceptions are when the code block is very short (1-2 tokens) or contains only a single logical action (such as a function call with many parameters).

Example: Don't do this:

# define function
{ pop 15 + return } 2 function someFunction globaldef
# take user input, process it, store it, store a modified version, print one of two messages
input convert:int : true someFunction : dup x def 3 + y def { "large" } { "small" } y . 33 < ifelse

Do this:

# define function
{
    # discard first argument
    pop
    # compute something
    15 + return
} 2 function someFunction globaldef

# take user input
input convert:int :
# process it
true someFunction : dup
# store it
x def
# store a modified version
3 + y def
# print one of two messages
{ "large" } { "small" } y . 33 < ifelse

Line indentation

Indenting one level should only be done inside code blocks; this also includes functions and methods. The braces of the code blocks themselves should be on the original indentation level, mimicking the "braces on next line" code style found in C-like languages. Whole-line comments are indented as code would be.

Example: Don't do this:

3 x def
    {
# a comment
    3 someComputation :
 2 someComputation :
      4 writeln
 } 3 4 < if

Do this:

3 x def
{
    # a comment
    3 someComputation :
    2 someComputation :
    4 writeln
} 3 4 < if

An exception to the indentation rule are methods: They, together with the constructor, should be aligned one level further than the surrounding code. The constructor defining calls themselves ( 3 constructor <classname> globaldef <classname> : ) should be on the same indentation level as the surrounding code.

Naming conventions

General naming in SOF is done with CamelCase. All names except for constructors (classes) should be lowercase, constructors are uppercase. Examples: fooFunc, connectToWebservice, doCoolComputation, myVariable, vector1, MyClass, Circle, FileCommunicator. Names in general should be self-explanatory and human-readable, avoid abbreviations and name collisions. (There is no significant speed benefit on shorter names, as all names are identified through some sort of hash)

Special naming conventions are used for functions that provide similar functionality (there is, of course, no function overloading in SOF): functions of this sort should be of the form fname'args:variant. fname is the general name of the function collection. args is either the number of arguments or the argument's types, separated by additional '. variant is either the variation on the base functionality or the return type.

Examples: The functions random:01, random:int and random:float all provide randomness, but :01 returns values between 0 and 1, :int returns integers in a range and :float returns floats in a range. Similarly, the function collection convert:<type> includes a lot of functions that convert to a specific type from other supported types. The writef'<argc> functions all provide formatted standard output writing, but with a different number of formatting arguments specified by the argc.

Cross-module naming

Names in modules can use an underscore to separate pseudo-namespaces (identical to the module name) from actual names. If the function names are reasonably generic and the number of exports in a module is small, this can be omitted.

For example, all the list functions in standard library list have a list_ prefix, such as list_elem

The SOF standard library

This is the official collection of library functions and classes provided by the SOF system.

Methods on built-in types

When performing a field call with certain identifiers, certain methods can be retrieved. This allows invoking those methods on primitive types, for example:

# sine of 3.2
3.2 sin ;
# retrieve first element of list
0 [ 1 2 ] idx ;

Defined methods are described separately for each type in the sub-sections.

Files in the standard library

math: Usual mathematical operations.
op: Built-in operations as callables.
io: (Not implemented) File input/output.
fp: (Not implemented) Helpers and tools for functional programming.

`list`

`idx`: Index into a list

Arguments < index: Integer < list: List

Return value < element: the value at index index in the list

This function implements normal indexing into a list. Any element can be retrieved from a list by means of its index. Indices are zero-based as in most programming languages. This means that the first element in a list is referred to by an index of zero. Using a negative index will retrieve elements from the end of the list, i.e. an index of -1 refers to the last element of the list (disregarding its size), -2 to the second to last element and so on. This is very useful for list-size-independent indexing from the end.

An index that would reach past the limits of the list throws an IndexError. The element function will always throw for empty lists. The indexing and throwing behavior is inherited by all list functions that take indices, unless noted otherwise.

`length`: Length of a list

Arguments < list: List

Return value < List length: Integer

Returns the number of elements in the list, always nonnegative. Returns zero for the empty list.

`head`: First element of a list

Arguments < list: List

Return value < First element in the list: Any value

This function returns the first value of the list. It throws IndexError if the list is empty.

`tail`: List tail

Arguments < list: List

Return value < The list's tail: List

Returns a new list that has the first element of the old list removed. Together with head, it can be used to split a list into its first element and its remainder.

Returns an empty list for the empty list.

`reverse`: Reverse a list

Arguments < list: List

Return value < The reverse of the list: List

Returns a new list which has all the elements of the old list, but at inversed positions. For example, the last element is now the first and the second element is now the second-to-last.

Returns the empty list for the empty list.

`split`: Split up a list

Arguments < index: Integer < list: List

Return value < A two-element list with the first and second portion of the list, in that order.

This function splits up a list into two halves. The first half contains all elements up to the index (including the element at the index), and the second half contains all elements after the index. The two halves are returned as a list containing two elements. It is a more efficient combination of the take and after functions.

`take`: First n elements of a list

Arguments < n: Integer < list: List

Return value < List of length n: List

Returns a new list that contains the elements of the given list up to the given index, exclusive. For positive n, this always means that the length of the new list is equal to n. Returns the empty list for n=0, returns the entire list if n is greater or equal the list's length.

`after`: Elements after an index

Arguments < n: Integer < list: List

Return value < List with elmts after n: List

Inverse of take; returns the elements that take dropped from the list. For positive n, this is equivalent to dropping the first n elements from the list, for negative n, it is equivalent to taking |n| elements from the end of the list (possibly the entire list if |n| > length(list). For n=0, the entire list is returned. For n greater than list length, the empty list is returned.

`first`: First element of the list

Arguments < list: List

Return value < element: Any value

Returns the first element in the list, equivalent to 0 list idx. This is intended for use with tuple-like lists.

`second`: Second element of a list

Arguments < list: List

Return value < second element: Any value

Returns the second element of the list, similar to first, and equivalent to 1 list idx.

`push`: Add to list

Arguments < element: Any value < list: List

Return value < new list: List

Appends the given element to the end of the list. [defined-behavior]: ../Reference/Language-Specification.md#r-defined-behavior [defined-behavior.intro]: ../Reference/Language-Specification.md#r-defined-behavior.intro [defined-behavior.nativecall]: ../Reference/Language-Specification.md#r-defined-behavior.nativecall [definition]: ../Reference/Language-Specification.md#r-definition [definition.interpreter]: ../Reference/Language-Specification.md#r-definition.interpreter [definition.rfc2119]: ../Reference/Language-Specification.md#r-definition.rfc2119 [definition.tool]: ../Reference/Language-Specification.md#r-definition.tool [error]: ../Reference/Errors.md#r-error [error.intro]: ../Reference/Errors.md#r-error.intro [error.kinds]: ../Reference/Errors.md#r-error.kinds [error.order]: ../Reference/Errors.md#r-error.order [error.termination]: ../Reference/Errors.md#r-error.termination [error.ui]: ../Reference/Errors.md#r-error.ui [execution]: ../Reference/Language-Specification.md#r-execution [execution.errors]: ../Reference/Language-Specification.md#r-execution.errors [execution.exit]: ../Reference/Language-Specification.md#r-execution.exit [intro]: ../Reference/Language-Specification.md#r-intro [source-file]: ../Reference/Language-Specification.md#r-source-file [source-file.comments]: ../Reference/Language-Specification.md#r-source-file.comments [source-file.extension]: ../Reference/Language-Specification.md#r-source-file.extension [source-file.syntax]: ../Reference/Language-Specification.md#r-source-file.syntax [source-file.syntax.error]: ../Reference/Language-Specification.md#r-source-file.syntax.error [state]: ../Reference/Language-Specification.md#r-state [state.call-stack]: ../Reference/Language-Specification.md#r-state.call-stack [state.stack]: ../Reference/Language-Specification.md#r-state.stack [type]: ../Reference/Types.md#r-type [type.boolean]: ../Reference/Types.md#r-type.boolean [type.boolean.call]: ../Reference/Types.md#r-type.boolean.call [type.callable]: ../Reference/Types.md#r-type.callable [type.codeblock]: ../Reference/Types.md#r-type.codeblock [type.codeblock.call]: ../Reference/Types.md#r-type.codeblock.call [type.constructor]: ../Reference/Types.md#r-type.constructor [type.constructor.call]: ../Reference/Types.md#r-type.constructor.call [type.decimal]: ../Reference/Types.md#r-type.decimal [type.decimal.nan]: ../Reference/Types.md#r-type.decimal.nan [type.error]: ../Reference/Types.md#r-type.error [type.function]: ../Reference/Types.md#r-type.function [type.function.call]: ../Reference/Types.md#r-type.function.call [type.hidden]: ../Reference/Types.md#r-type.hidden [type.identifier]: ../Reference/Types.md#r-type.identifier [type.identifier.call]: ../Reference/Types.md#r-type.identifier.call [type.integer]: ../Reference/Types.md#r-type.integer [type.number]: ../Reference/Types.md#r-type.number [type.object]: ../Reference/Types.md#r-type.object [type.primitive]: ../Reference/Types.md#r-type.primitive [type.primitive.call]: ../Reference/Types.md#r-type.primitive.call [type.string]: ../Reference/Types.md#r-type.string [type.value]: ../Reference/Types.md#r-type.value

`math`

`abs`: Absolute value

Arguments < a: Number

Return value < |a|: Number

Returns the absolute value of the given number.

`sin`: Sine

Arguments < a: Float

Return value < sin(a): Float

Returns the mathematical sine of the input. The input angle is treated as radians.

`cos`: Cosine

Arguments < a: Float

Return value < cos(a): Float

Returns the mathematical cosine of the input. The input angle is treated as radians.

`tan`: Tangent

Arguments < a: Float

Return value < tan(a): Float

Returns the mathematical tangent of the input. The input angle is treated as radians.

`exp`: Exponent

Arguments < a: Float

Return value < e^a: Float

Returns e (Euler's constant, approximately 2.718281) to the power of a. This is the most accurate power function.

`ln`: Natural logarithm

Arguments < a: Float

Return value < ln(a): Float

Returns the natural logarithm of a. This is the most accurate logarithm function.

`log`: Logarithm

Arguments < n: Float < a: Float

Return value < log_n(a): Float

Returns the logarithm with the base of n of a. This is mathematically equivalent to ln(a) / ln(n).

`hypot`: Hypotenuse

Arguments < a: Float < b: Float

Return value < hypot(a, b): Float

Returns the size of the hypotenuse with the adjacent and opposite being a and b. This is the value sqrt(a*a + b*b) as calculated by Pythagoras' formula, but it avoids overflows and imprecisions caused by large intermediary values.

SOF Language Specification

[intro]

This section specifies the Stack with Objects and Functions programming language.

[definition]

Definitions

[definition.rfc2119]

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

Most terms are defined as they are introduced.

[definition.interpreter]

An SOF interpreter is any program that takes SOF source files as input and executes them according to this language specification, producing identical semantics to any other interpreter. Alternatively, an interpreter MAY also produce another program in source code or machine executable form which produces identical semantics to the SOF source file inputs; what is more commonly called a compiler. This document describes a set of behaviors that are allowed and MUST be followed in an ideal interpreter. In practice, some programs that deviate slightly (most commonly due to bugs) from this specification are also called interpreters.

[definition.tool]

An SOF tool is any other program that deals with SOF source files. Their behaviors are not prescribed in this specification, and they MAY intentionally go beyond or against the specification to serve a certain purpose. For instance, tools may decide to introspect source code comments for a variety of purposes, while comments must be ignored by interpreters.

[source-file]

Source Files

An SOF source file is a program or part of a program in the SOF programming language. Source files are plain text files using UTF-8 ¹ character encoding. Newline sequences consist of an optional carriage return followed by a line feed.

[source-file.extension]

Source files use the standard extension .sof.

[source-file.syntax]

Source files adhere to the following Extended Backus-Naur form specification:

(* A program is a series of tokens and comments, where tokens MUST be separated by whitespace. *)
SofProgram = [Token] { [Comments] ?Whitespace? [Comments] Token } { [Comments] ?Whitespace? } ;

Token = "def" | "globaldef" | "dexport" | "use" | "export"
      | "dup" | "pop" | "swap" | "rot" | "over"
      | "write" | "writeln" | "input" | "inputln"
      | "if" | "ifelse" | "while" | "dowhile" | "switch"
      | "function" | "constructor"
      | "+" | "-" | "*" | "/" | "%" | "<<" | ">>" | "cat"
      | "and" | "or" | "xor" | "not"
      | "<" | "<=" | ">" | ">=" | "=" | "/="
      | "." | ":" | "," | ";" | "nativecall"
      | "[" | "]" | "|"
      | "describe" | "describes" | "assert"
      | Number | String | Boolean
      | Identifier | CodeBlock ;

Identifier = ?Unicode Letter? { ?Unicode Letter? | DecimalDigits | "_" | "'" | ":" } ;
(* A code block recursively contains SOF code, i.e. an SofProgram. *)
CodeBlock = "{" SofProgram "}" ;

(* Literals *)
String = '"' { ?any character except "? '\"' } '"' ;
Boolean = "true" | "false" | "True" | "False" ;
Number = [ "+" | "-" ] ( Integer | Decimal ) ;
Integer = "0" ( "h" | "x" ) HexDigits { HexDigits }
        | [ "0d" ] DecimalDigits { DecimalDigits }
	| "0o" OctalDigits { DecimalDigits }
	| "0b" BinaryDigits { BinaryDigits } ;
Decimal = DecimalDigits { DecimalDigits } "." DecimalDigits { DecimalDigits }
          [ ("e" | "E") ( "+" | "-" ) DecimalDigits { DecimalDigits } ] ;
BinaryDigits = "0" | "1" ;
OctalDigits = BinaryDigits | "2" | "3" | "4" | "5" | "6" | "7" ;
DecimalDigits = OctalDigits | "8" | "9" ;
HexDigits = DecimalDigits | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" ;

(* Comments are ignored. *)
Comments = Comment { Comment } ;
Comment = ( "#" { ?any? } ?Unicode line break/newline? )
        | ( "#*" { ?any? } "*#" ) ;

[source-file.syntax.error]

Any violation of this syntax by a program MUST raise a SyntaxError when given as input to an interpreter.

SofProgram is the syntax specification for an entire SOF source file. The source file consists of two types of syntactical constructs: Comments and Tokens.

[source-file.comments]

Comments are purely for the benefit of the programmer and MUST NOT have meaning to an SOF interpreter.

Note

Tokens are the core of the SOF program. The tokens are ordered in a linear sequence. The only exception is the code block token: A code block recursively nests another sequence of tokens. The major other differentiation in the token type is between the Literal Tokens that behave and look like the literals in other programming languages, as well as the Primitive Tokens aka. keywords that execute program logic. The phrase Primitive Token is used to distinguish atomic keywords from the non-atomic { and } keywords.

[state]

Program state

For semantic purposes, the program state of a running SOF program consists mainly of two things:

[state.stack]

A stack of values visible to the program. The stack is a LIFO stack/queue (last in, first out) that can contain any kind of value. Of these values, there are types that the user can place and read via the use of certain tokens, and there are the "hidden types" that are used to make the program execute correctly. Hidden types are specified less precisely, and the user is not generally allowed to interact with them.

[state.call-stack]

A stack of token lists that are being executed, with associated information about where these lists come from and what should happen after they finish executing.

[execution]

Executing

An SOF program consists of a list of tokens. Executing the program consists of running the action of each token in the order it appears in the list of tokens.

Each token may have an effect on the SOF program environment, modifying it to a new state. Tokens may change the execution state of the program, by modifying the token list stack. For instance:

add a token list to be executed next
stop executing the current token list or any others that are being executed
modify the current token list

[execution.errors]

Tokens may also produce errors, which can then cause further changes to the SOF program environment.

[execution.exit]

The SOF program exits when:

the last token is executed, and the token list stack is empty.
an uncaught error occurs.
the program is aborted by any other, lower-level means, such as a system call requesting process termination (exit())

[defined-behavior]

Defined Behavior

[defined-behavior.intro]

By default, any behavior in SOF is considered defined. This means that the interpreter MUST NOT deviate from the behavior prescribed in the specification. Some parts of SOF execution are unspecified and leave multiple implementation avenues for interpreters. In these cases, any behavior the interpreter chooses to exhibit MUST be limited to this unspecified part. In particular, any behavior exhibited in unspecified sections MUST NOT propagate to the remainder of the program execution. This means that any further execution MUST (return to and) follow defined SOF execution behavior.

[defined-behavior.nativecall]

The exception to defined behavior is formed by native functions, invoked through the nativecall operator. As specified, native functions may be supplied by the user and are therefore not part of the interpreter and its guarantees of SOF semantics. The specification for nativecall defines what set of behaviors are allowed for native functions. By virtue of being defined in another programming language, possibly with vastly more flexible behavior than SOF and complex semantics, possibly with far-reaching access to the SOF execution state, the interpreter cannot be expected to verify the correct behavior of all native functions, including catching any incorrect behavior as soon as it happens.

Therefore, if a native function does not uphold the allowed set of behaviors for native functions as laid out in the specification for nativecall, as soon as this function is invoked through use of nativecall, from this point onwards, the interpreter MAY exhibit any possible behavior, including any behavior normally forbidden by this specification. The interpreter MAY never return to defined behavior, as opposed to situations where behavior is locally unspecified. This is called Undefined Behavior. Undefined Behavior MUST NOT result from any other action other than the one described.

RFC3629: UTF-8, a transformation format of ISO 10646. https://www.rfc-editor.org/rfc/rfc3629.html

[error]

Errors

[error.intro]

When a problem is encountered with the user-supplied SOF programs or their state during execution, the interpreter MUST throw an error.

Note

Currently, errors cannot be caught. A feature for catching errors may be added later via the except token.

Some errors may not happen in certain interpreters, as there are no strict conditions causing them. These errors MAY be not implemented in the interpreter.

[error.termination]

The action of throwing an error terminates the program.

[error.order]

When, during execution, any operation fulfils multiple error conditions, and therefore many different errors may be thrown, the exact error thrown (first) is unspecified.

[error.ui]

Interpreters MUST inform the user of the fact that an error has occurred. They SHOULD provide further information, such as (but not limited to):

Where the error occurred in the program. This includes (if applicable) the current token that was executed when the error occurred, the call stack of functions and other callables that were in progress at the time, and the state of the main stack itself.
The exact condition which caused the error. For example, for an ArithmeticError, the interpreter SHOULD report what operation was invoked with which exact operands.
Resources to learn about the error and how to prevent it, such as documentation links.

[error.kinds]

Several kinds of errors exist. The following listing contains all the errors defined by and used in the reference. Implementations MAY add additional errors as subtypes of the predefined ones. The conditions under which errors are throw are defined throughout the reference in the context of specific operations and descriptions of SOF execution.

SyntaxError
TypeError
NameError
ArithmeticError
StackAccessError
StackSizeError

[type]

Types

Any value that is contained on the stack and visible to the SOF program has one of a few types. Types form a hierarchy, with certain types being considered subtypes of others. This means that any operation allowed on the parent type is also allowed on the subtype, with possibly diverging behavior.

[type.error]

Any operation may specify the operand types on which it is valid. Supplying any operation with an operand of a different type causes a TypeError.

[type.hidden]

Some types are considered hidden. This means the user cannot fully interact with them.

[type.value]

Value

Value or Any is used to refer to any type, or the union of all types.

[type.callable]

Callable

Subtype of: Value

Any type that can be operated on with the call operator ..

[type.identifier]

Identifier

Subtype of: Callable

A string-like type for textual identifiers used in name binding.

[type.identifier.call]

Invoking a call on an identifier performs name lookup.

Note

Identifiers provide almost none of the functionality of strings. They are not intended to be used as an alternative string type.

[type.primitive]

Primitive

Subtype of: Callable

Any type that can be specified with a simple literal.

[type.primitive.call]

Invoking a call on a primitive performs the identity operation, i.e. it returns the primitive’s value. Boolean is an exception to this.

[type.number]

Number

Subtype of: Primitive

Any numeric type. Arithmetic operations only operate on numbers.

[type.integer]

Integer

Subtype of: Number

Integral signed number type. MUST be represented in two’s-complement. All operations perform wrapping arithmetic by default, discarding any carries. Minimum representable range must be $[ -2^{63} ; 2^{63}-1 ]$, i.e. 64 bits.

[type.decimal]

Decimal

Subtype of: Number

Real number type. The precision required should either match DEC64¹, or IEEE 754-2019² double-precision floating-point. Higher precision is allowed.

[type.decimal.nan]

A NaN value of unspecified representation MUST be available. Infinite values may be available and may have specific behavior in certain operations in interpreters that support them. Otherwise, infinities have the behavior of NaN.

[type.boolean]

Boolean

Subtype of: Primitive

Truth value, either true or false.

[type.boolean.call]

When called, it takes two elements from the stack and returns the lower one if it is true, or the higher one if it is false.

[type.string]

String

Subtype of: Primitive

A string is a sequence of Unicode code points, represented in UTF-8. Its maximum length may be arbitrarily large.

[type.codeblock]

CodeBlock

Subtype of: Callable

A code block is a list of tokens, created by enclosing these tokens in a pair of braces { and }.

[type.codeblock.call]

When called, it executes the list of tokens.

[type.function]

Function

Subtype of: Callable

A function is a combination of a list of tokens to be executed and a positive (or zero) integer amount of arguments.

[type.function.call]

When a function is called, it retrieves a number of values from the stack equal to the number of arguments. Then, it places a Function Nametable on the stack. Then, it places the arguments back on the stack, in the same order.

[type.object]

Object

Subtype of: Value

An object is a key-value store that the user can freely modify. When a method is called on the object, the object is used as a nametable, allowing easy modification via normal definition operations.

[type.constructor]

Constructor

Subtype of: Callable

A constructor is a special kind of function responsible for object creation.

[type.constructor.call]

When called, it creates a new object and is able to initialize that object.

Douglas Crockford: DEC64. https://www.crockford.com/dec64.html ²: "IEEE Standard for Floating-Point Arithmetic," in IEEE Std 754-2019 (Revision of IEEE 754-2008), vol., no., pp.1-84, 22 July 2019, doi: 10.1109/IEEESTD.2019.8766229.

Primitive Tokens

Every primitive stack operation is called a primitive token. It is listed with its arguments, the stacklowest argument first, and its return value description. No return value section means that this operation places nothing on the stack. Get familiar with this argument and return type shorthand, it is used in all the documentation.

Operation special cases for identifiers

In order to make in-place modification of defined values easier, it's possible to combine any of the binary operations +, -, *, /, %, <<, >>, <, >, <=, >=, =, /=, and, or, xor with an identifier as the first (right, lower) argument. This will retrieve the value from the identifier through a .-like call, perform the operation on that value as the right argument instead, and then store the result back into the identifier like normal def. The result does not remain on the stack. This is equivalent to the +=, -= etc. operator found in many programming languages.

Miscellaneous tokens

`input` (string input function)

Return value < the token read from stdin: String

Reads one word, i.e. everything in the standard input up to the first (Unicode) whitespace character, without trailing or leading whitespace characters.

`inputln` (line string input function)

Return value < the line read from stdin without line terminator(s): String

Reads one line (any combination of line separators end one line) from standard input.

`write` (output function)

Arguments < output: String

Writes the argument to standard output.

`writeln` (output function w/ line break)

Arguments < output: String

Writes the argument to standard output and terminates the line.

Arithmetic and Logic PTs

`+` (add operator)

Arguments < left: Number < right: Number

Return value < Mathematically: left + right: Number

Computes the sum of the two arguments. The result is an Integer if both arguments are Integers and a Decimal if any argument is a Decimal. Throws TypeError if any of the arguments has a non-number type.

`-` (subtract operator)

Arguments < left: Number < right: Number

Return value < Mathematically: left - right: Number

Computes the difference between the two arguments. The result is an Integer if both arguments are Integers and a Decimal if any argument is a Decimal. Throws TypeError if any of the arguments has a non-number type.

`*` (multiply operator)

Arguments < left: Number < right: Number

Return value < Mathematically: left · right: Number

Computes the product of the two arguments. The result is an Integer if both arguments are Integers and a Decimal if any argument is a Decimal. Throws TypeError if any of the arguments has a non-number type.

`/` (divide operator)

Arguments < left: Number < right: Number

Return value < Mathematically: left ÷ right: Number

Computes the result of the first argument divided by the second argument. The result is an Integer (division with remainder) if both arguments are Integers and a Decimal (division) if any argument is a Decimal. Throws TypeError if any of the arguments has a non-number type. Throws ArithmeticError if the right argument is zero.

`%` (modulus operator)

Arguments < left: Number < right: Number

Return value < Mathematically: left mod right: Number

Computes the result of the first argument modulus by the second argument. First, any Decimals are converted to Integers. Then, the remainder of the integer division of the two arguments is computed and returned. Throws TypeError if any of the arguments has a non-number type. Throws ArithmeticError if the right argument is zero.

`<<`, `>>` (logical bit shift operators)

Arguments < base: Number < amount: Number

Return value < base (<< or >>) amount: Integer

Computes the logical bit shift, that is the base (first argument) shifted left (<<) or right (>>) amount number of bits. Because this is a logical shift, it does not sign-extend the base. If this operation receives Floats as arguments, it truncates them to Integers.

`<`, `>`, `>=`, `<=` (comparison operators)

Arguments < left: Number < right: Number

Return value < Result of the comparison: Boolean

Compares the two arguments, always in the form left <comp> right. The operators are less than, greater than, less than or equal, greater than or equal, respectively. This operation throws a TypeError if any of the arguments is not a Number.

`=`, `/=` (equality operators)

Arguments < left < right

Return value < Whether the values are equal/unequal: Boolean

Checks whether the two arguments are equal or not equal, respectively. Two arguments are compared using the following algorithm:

If both arguments are Numbers: Check whether their numeric value is equal. Integers are converted to Floats if at least one of the arguments is a Float.
If both arguments are Booleans: Check whether they represent the same truth value.
If both arguments are Strings: Check if every single one of their characters matches in order.
If both arguments are Objects: Check if their nametables contain the same value for each key and whether they contain the same list of keys. The values are checked with this same algorithm.
If both arguments are any other builtin value: Return false. This applies most importantly to CodeBlocks and Functions, as there is no simple way of determining their equality.

If the arguments aren't of the same type, upcasting is done, where Booleans upcast to Numbers and all other types upcast to Strings. This means that, for example, "2" 2 = holds true. For stricter equality, check the types first.

`and`, `or`, `xor` (binary logic operators)

Arguments < left < right

Return value < result of the operation: Boolean

Compares the two operators according to their Boolean value. The algorithm of finding the boolean value is the exact same as convert:bool uses.

`not` (negation operator)

Arguments < arg

Return value < result of the negation: Boolean

Negates arg's value; if it was true, it is now false, if it was false, it is now true. If the argument is not a Boolean, its truthiness value is determined according to convert:bool.

Control flow

`dowhile` (loop with at least one iteration)

Identical to while, but will execute the body callable before checking the condition, which results in at least one call to the body.

`if` (conditional execution operator)

Arguments < to execute: Callable < condition: Boolean

Executes the callable if condition is true.

`ifelse` (conditional execution operator with alternative)

Arguments < to execute: Callable < condition: Boolean < to execute otherwise: Callable

Executes the first callable if condition is true. Otherwise, executes the second callable.

`switch` (multi-ifelse conditional execution operator)

Arguments < "switch::" (Identifier) < ( case body: Callable < case condition: Callable ) * any number of times < default body: Callable

Compact alternative to nested ifelse's. The behavior of this is as follows:

The default body, the element last on the stack, is stored for later use. Then, the entire stack is traversed two elements at a time. If the first element is the identifier "switch::", the beginning/end of the switch has been reached; this special identifier serves as a sort of label to delineate the statement from the other, likely important stuff on the stack. As no case has been executed yet, the default body is executed.

If, however, the first element is a Callable, it is executed and the algorithm expects a Boolean value to be situated on top of the stack afterward. If this Boolean is true, the second element, the corresponding case body, is executed. Otherwise, the search continues.

`while` (loop function)

Arguments < body: Callable < condition: Callable

For every iteration of executing the body, executes the condition callable, which should place a Boolean value onto the stack. If the boolean value is true, the body will be executed once and the cycle repeats, if it is false, the loop will end.

Stack & Naming

`def` (definition operator)

Arguments < value: Value < name: Identifier

Modifies the LNT by setting the key-value pair name: value. This means that now the value of the identifier name is "defined" to be the value value, hence the name. Will overwrite any existing binding to name.

`dup`

Arguments < elmt: Value

Return value < elmt < elmt

Duplicates the topmost element on the stack.

`globaldef` (global definition operator)

The same as def, but always defines into the GNT.

`pop` (stack remove operator)

Arguments < any

Removes the topmost element from the stack and discards it.

`swap` (stack exchange operator)

Arguments < x < y

Return value < y < x

Exchanges the position of the two top-most elements on the stack.

Functional

`.` (call operator)

Arguments < 0 or more < tocall: Callable

Return value < any value or none

Calls the topmost element of the stack. Each type exhibits its own behavior when called, but the most basic are:

Most primitives return themselves.
Calling an identifier looks up that identifier's value in the innermost scope that has the identifier defined or throws a NameError if that fails.
Calling a namespace looks up the next stack element, an identifier, in that namespace. This process is recursive, although currently namespaces cannot be nested.
Calling a function executes it and consumes the specified number of arguments from the stack. The return value of the function is placed on the stack. An exception to this is when the function arguments are "blocked off" with a currying operator (see below). In that case, the function is not called but it and the curried arguments replaced by the curried function.
Calling a code block executes it. There are neither arguments nor a return value.

`:` (doublecall/function invoke operator)

Arguments < 0 or more < tocall: Callable

Return value < any value or none

Calls the topmost element of the stack twice. I.e., after the first call, the now topmost element is immediately called again. Therefore, this PT is a shortcut that is exactly equivalent to (and faster than) . .. This is intended for convenient use of named functions, i.e. functions defined into a namespace. The first call will retrieve the function itself onto the stack, and the second call will execute it. Therefore, the normal way you will see functions be used is with :, and it's an easy indicator of distinguishing a variable lookup from a function call.

`|` (currying marker)

This is a pseudo-value on the stack that does nothing and is invisible to almost all operations. It is used to limit the number of arguments a function receives. If a function, while retrieving arguments from the stack for calling, encounters a currying marker before all necessary arguments are found, the function is not called but a curried function is created instead that has the specified number of arguments "pre-stored". The curried function can be called later, it takes the number of remaining arguments. These appear on the stack below the curried arguments, so that the actual argument order inside the function doesn't change.

`function` (function definition operator)

Arguments < code: CodeBlock < argcount: Integer

Creates a function with argcount arguments. Usually, it is then def'd or globaldef'd with a name.

`nativecall` (native function invocation operator)

Arguments < any number of arguments < native function identifier: String

Return value < any

Calls a function defined natively in the interpreter. Learn more.

`return` (function return operator)

Arguments < any

Saves the topmost value on the stack as the current function’s return value, then returns from the function immediately, exiting all non-function scopes (like code blocks) and clearing the stack down the function’s scope.

`return:0` (value-less function return operator)

Returns from the current function just like return, but does not return any value (and does not consume anything from the stack).

This includes PTs that create or handle complex types.

`[` (list marker)

This is a pseudo-value on the stack that does nothing and is invisible to almost all operations. It is used as a delimiter for the start of a list when the list creator ] is used.

`]` (list creator)

Arguments < [ < any number of elements (< ])

Return value < literal: List

Creates a new list from a number of literal values. This operation traverses the stack downwards until it hits the list start marker [; this means that the list creator operation is the only one that actually handles the start marker and doesn't just ignore it. All the elements in between are used as the initial values of the list, and they are ordered such that the lowest values are the first in the list. For example, the code [ 1 2 3 ] creates a list with the Integer elements 1, 2, and 3, in that exact order. This entire literal list creation system is therefore very intuitive while still respecting SOF's orthagonality.

`,` (fieldcall operator)

Arguments < object: Object < field: Callable

Return value < object < value

Executes a call on the object's nametable with the given field as a Callable to execute. The field is usually an identifier identifying a field value on the object in question. It can however be any sort of callable data, including callables that do not interact with their nametable. The object is left on the stack below the return value, making it easily available for further processing.

`;` (methodcall operator)

Arguments < object: Object < 0 or more < tocall: Callable

Return value object < any value or none

Calls the topmost element of the stack twice, like :. Additionally, remembers the object before the calls occur and places it back onto the stack below the return value of the second call. Furthermore, the function call actually happens with the object nametable as the function nametable, so the function can use object attributes as variables and modify the object with def's. It is also possible to add new attributes to an object; for this reason, if you need named temporaries you should use an inner dummy function.

This operator is intended to be used with method-like named functions, functions that expect an object to operate on. SOF technically has no bound functions (you can emulate them by attaching function values to object attributes, but using those is a lot more cumbersome), so all functions that act like methods are free functions expecting some object to operate on. The big advantage is that as long as you pass an object that behaves like the one the function expects, the function will operate just fine. The method call operation will leave the program with the result of the operation (possibly none), and the original object below it, which allows for it to be used further.

`constructor` (constructor creation operator)

Arguments < code: CodeBlock < args: Integer

Return value < constructor: Constructor

Creates a new object creation template by turning the code block into a constructor function. Inside the constructor's code block, defs can be used to initialize fields on the object's nametable. A new object can be created by executing the constructor; this is the earliest time that the code body is executed. Just like other functions, the constructor can obtain any number of arguments when called.

Modules

`dexport` (definition+export operator)

Arguments < value: Value < name: Identifier

name dexport is syntactic sugar for name globaldef name export. This operator simply binds the value to name in the GNT and also exports it.

`export`

Arguments < name: Identifier

Exports the value bound to name in the LNT. Exporting is the method of making data visible to other SOF modules that import this module. Only exported names, not all names in the GNT, will be available to the module after import.

`use` (import module)

Arguments < module name: String

This PT is part of the module system, documented here. It executes modules and imports their exported definitions into the global namespace.

Builtin functions

These functions are always available to the user, and part of the prelude file in the standard library. The difference to other modules is that the prelude file is executed as if its text was in the file itself, so normal module mechanisms don't apply unless you explicitly "prelude" use.

`convert:bool` (Not implemented)

Arguments < toConvert

Return value: converted: Boolean

Converts the argument to a Boolean. If the value is not already a Boolean, it uses the "truthyness" of the argument, which is almost always true. When the argument is 0 or 0.0, it is false.

`convert:decimal`

Arguments < toConvert

Return value: converted: Decimal

Converts the argument to a float. The argument can either be a string containing a valid SOF float literal (plus any leading/trailing whitespace), an integer or a float already. The result is the corresponding float value; the function fails with a TypeError if conversion fails, e.g. wrong number format, unsupported origin type.

`convert:int`

Arguments < toConvert

Return value: converted: Integer

Converts the argument to an integer. The argument can either be a string containing a valid SOF integer literal (plus any leading/trailing whitespace), an integer already or a float to be rounded. The result is the corresponding integer value; the function fails with a TypeError if conversion fails, e.g. wrong number format, unsupported origin type.

`convert:string`

Arguments < toConvert

Return value: converted: String

Converts the argument to its string representation. This is the same process used by the output methods. The argument can be of any type, as any SOF type has a string representation, but the result might not be beautiful.

`convert:callable`

Arguments < toConvert

Return value: converted: Callable

Converts the argument to its callable equivalent. This has the following result:

"Real" callables are unchanged. This affects functions, code blocks and identifiers.
Primitives are converted to a Church encoding version of themselves. (Not implemented) This means:
- Natural numbers n >= 0 are converted to a callable that when called with another callable f, will call f n times. If f returns a value and receives an argument, this is exactly equivalent to the notion of Church numerals.
- Booleans are converted to a callable that when called with two arguments, will return the first (stack-lowest) argument if it is true, otherwise, it will return the second argument. This also means that ca cbcond if (ca, cb Callables, cond Boolean) is equivalent to ca cb condconvert:callable : .
- Other Integers x are converted to a two-element list [a, b] where a, b ∈ ℕ are Church numerals as described above and x = a - b.
- Decimals x are first converted to the most accurate rational representation. Then, a two-element list [k, a] is created where k is a Church numeral (integer), a is a Church-encoded natural number and x = k / ( 1 + a ).

The conversion fails on Strings and other more complex types and throws a TypeError.

`random:01`

Return value: random number: Decimal

Generates a pseudo-random number between 0 (inclusive) and 1 (exclusive), optimally using a system-provided RNG (such as /dev/urandom on Linux). THIS PSEUDO-RANDOM NUMBER GENERATOR IS NOT GUARANTEED TO BE CRYPTOGRAPHICALLY SAFE.

`random:int`

Arguments < start: Integer < end: Integer

Return value: random number: Integer

Generates a pseudo-random number between start and end, inclusive. Uses random01 as the initial source of randomness (and, therefore, is NOT CRYPTOGRAPHICALLY SAFE).

`random:decimal`

Arguments < start: Decimal < end: Decimal

Return value: random number: Decimal

Floating-point variant of random:int with equivalent behavior. Returns any floating point number between start and end, inclusive.

`fmt'x`

Arguments < format: String < x format arguments

Return value: formatted: String

Formats a string with a number of format arguments. fmt'x is just a placeholder name; the actual functions are called fmt'0 through fmt'9 with 0-9 arguments, respectively. The exact format specificer format is not well-documented and can be found with the relevant native implementations. It's similar to Java's format string syntax though, and some tests exist for it.

`pair`: Create a tuple

Arguments < a: Any value < b: Any value

Return value < [ a, b ] : List

Creates a two-element list from the two arguments. Main function for creating tuple-like lists (short lists of known length) and returning two values.

The SOF Module System

SOF's module system is intended to be simple, but flexible and practical. It is very reminiscent of Python's module system.

Modules, files, and folders

Each SOF source code file is a separate module. Folders are not special, they can just serve to group modules and avoid naming conflicts. There are no special module names such as init or main in Python, all files ending in .sof are accessible equivalent modules.

Modules are named hierarchically with familiar dot syntax. Modules starting with a dot . are relative modules, and modules starting with any other character are absolute modules.

Relative modules import relative to the file location. Single dots (except the leading dot) are used to import one directory lower, i.e. the name between this dot and the one before it is considered a directory in with to look for the module. Double dots .. are used to import a directory higher (cf. directory navigation in all major operating systems). The highest directory possible is the directory of the base module of the program if the relative import chain originates from the base module, or the directory of the libraries if the relative import chain originates from a library module that was imported absolutely. This distinction prevents nonsensical and dangerous "upwards" imports while allowing for useful features like sibling folder importing.

Absolute modules import in the library directory. This is a runtime-constant directory which will later be accessible with command-line arguments and/or environment variables. It usually sits in a related directory to the SOF executable itself. The library directory contains not only the SOF standard library modules but also any modules added manually by the user or by package managers. Modules imported absolutely can import relatively themselves, which again allows for submodule structures even in the libraries. Within an absolute module, single dots can also be used to import in sub-directories of the library directory.

The module name, i.e. the name after the final dot, never contains the .sof ending. This allows for the alternative endings and special file formats which are treated specially by the module system, like .soflib.

Each naming segment in any module specification, which represents either folders or the final file, can contain all characters except the two slash characters (used by the operating systems for directory structure) and dots, of course.

Given this detailed description, the method of resolving modules is unambiguous and straightforward. Modules are always treated with UTF-8 encoding, just as all SOF files are.

Names

As SOF has no namespaces like C, care needs to be taken when naming functions and other exports of a module. As they overwrite all GNT entries of the same name upon import, duplicate definitions are technically allowed (though the interpreter might issue a warning). The convention as outlined in the programming conventions is to use underscores for separating pseudo-namespaces where necessary.

The `use`, `export` and `dexport` primitive tokens

The use primitive token is used to import a module. The module specification, its behavior explained above in detail, is given by a string. The SOF module system imports the specified module, which may come from the internal cache if it was already imported. Then, all of the bindings defined by export or dexport are imported into the importing file's global namespace. This means that you don't have to worry about cluttering global namespaces with unnecessary names: only the names you export in a module are visible to users of that module.

Note that of course, use is recursive. SOF code that is currently executed as part of a module import can use other modules without any different rules or exceptions. The only impossible module connection is any sort of circular import. The reason is equivalent to Python's reason: Because module importing always involves executing the entire imported module's source code. However, given the huge ecosystem of Python libraries, it is clear that this is not a limitation and all circular dependencies can be reworked to strict hierarchical dependencies.

Running functions in other modules

There is special treatment given to exported functions, which technically is a special rule about all functions but only becomes relevant with cross-module functions. All functions store the global nametable at the time of definition; i.e. the global nametable of their module or file. As each module gets its own global nametable, this means that functions in different modules refer to different GNTs, but functions in the same module refer to the same GNT. When a function is run, the global nametable is in fact temporarily replaced with the function's global nametable at defintion time (if necessary) and restored afterwards. This means that a function can access global values of its module like one would expect. To keep the orthagonality, the global nametable exchanging can be thought of as a stack of global nametables at the very bottom of the real stack. The actual global nametable is just the top of this sub-stack, and global nametables are pushed to and popped from the stack on function entry and exit.

How does SOF actually work?

This page shall describe the way that SOF works internally, while staying language-independent, as to accomodate other implementations of SOF compilers and interpreters. Nevertheless, Examples of the reference Java implementation shall be given, as it currently is the only existing implementation of SOF.

Data Structures

SOF is a pure stack-based language. That means: All data always resides on the Stack, a linear unit of memory cells that contain data. Although we know a stack as being only LIFO and having one single visible element (the top, head or first element of the stack), in practice the SOF stack should be a Deque. If it wasn't, one would need two independent stacks for many operations (but both of them could be real, pure stacks).

But what is a Deque? This term, pronounced "deck", is short for "double ended queue" and describes a data structure with arbitrary access on both ends. The top of the deque is accessed with peek, pop and push, while the bottom of the deque is accessed with peekLast, popLast, pushLast. Java provides not only a Deque in its Collection framework, but also many specialized implementations, such as the currently used ConcurrentLinkedDeque, which is a double-linked-list implementation with threadsafety.

All data lives on the stack, this was already stated. But how does this allow for named variables, namespaces and function calling? The answer is the second most important data structure of SOF, the Nametable.

A Nametable is simply a list of key-value mappings (Map in Java and JavaScript, dict in python) that maps identifiers to any SOF data. Pretty simple, but this powers all of SOF. All defintions made by def are simply entries into nametables, the Call operator . simply accesses entries in nametables.

The Global Nametable

The global nametable (GNT) is always the lowest element on the stack; when it is missing, something serious has gone wrong. The SOF programmer can never inspect, modify or remove the GNT, but it is being used all the time:

All defintions made on a global level enter the GNT
All imported NNTs (see below) are placed in the GNT

The GNT will be discussed in further detail with its use cases.

Scoping, the Call and Def operators

A Scope is created whenever a function starts. The scope is signaled by a special NT on the stack, called a function delimiter (FD). FDs hold special information on where to return execution when the function ends and what the return value is. Also, the FD cannot be taken off the stack by the program in any other way than returning from executing the current code block or function.

As FDs are NTs, this means that at any point in the program there could be many NTs on the stack at once. To figure out which NT is to be used for the call operator . with an Identifier, the following simple rule is applied: Walk down the stack from top (last) to bottom (first). Whenever a Nametable is encountered, determine whether it contains the identifier that . wants to call. If so, retrieve this identifier's value from this Nametable, if not, continue the search all the way to the NNT/GNT. If nothing is found, throw a NameError. This ensures that definitions made in a "more local" NT are more important and hide those in a "more global" NT.

Normally, def operates on the Local Nametable (LNT), which is simply the highest NT on the stack. This may be an FD, or the GNT if there is no FD. This importantly means that code inside a function cannot modify nametables outside unless using the globaldef operator:

Sometimes, the user wants to define into the GNT. For this, the operator globaldef is provided, which exhibits the same behavior as def except for always defining into the GNT. This is useful for defining functions in an enclosed scope, modifying global variables and so on. It is also convention to always globaldef global functions, constructors, etc.

Native calls

The PT nativecall executes a call to a natively implemented function. The only explicit argument of the native call is a string identifying the native function to call. Because the reference implementation is Java-based, the way in which native functions are identified is very similar to Java method signatures. The general form is NativeFunction = Package { "." Package } "." Class "#" Method "(" [ ArgumentType ] { "," ArgumentType } ")", where Package represents a legal Java package name, Class represents a legal Java class name and Method represents a legal Java method name. The argument types must be the internal SOF type names that the reference implementation uses, like StringPrimitive, FloatPrimitive etc. There may be any number of arguments separated by a comma (but no spaces!), and possibly none. The arguments are taken from the stack when nativecall runs, where the last argument taken from the stack is the first argument passed to the native function. The native function may return an SOF typed result, in which case it is placed on the stack, or it may return nothing (void), in which case the stack is not modified. Native functions may throw (incomplete) compiler exceptions, in which case they propagate from the nativecall as normal SOF errors of type Native.

Glossary

PT: Primitive token. A special token that has the syntax of an identifier (i.e. if it wasn't special, it would be treated as an identifier) but executes a special operation that (for the most part) cannot be accomplished by other means.
Nametable: A key-value mapping structure (compare to Java's & JavaScript's Map, python's dict) that is the second most important data structure of SOF internally after the Stack. All Nametables live on the Stack.
GNT: Global nametable. Lowest element on the stack, used for top-level lookups and globaldef. Exported functions keep the GNT at export time.

SOF Programming Language Documentation