SOF Language Specification

This section specifies the Stack with Objects and Functions programming language.

Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

Most terms are defined as they are introduced.

An SOF interpreter is any program that takes SOF source files as input and executes them according to this language specification, producing identical semantics to any other interpreter. Alternatively, an interpreter MAY also produce another program in source code or machine executable form which produces identical semantics to the SOF source file inputs; what is more commonly called a compiler. This document describes a set of behaviors that are allowed and MUST be followed in an ideal interpreter. In practice, some programs that deviate slightly (most commonly due to bugs) from this specification are also called interpreters.

An SOF tool is any other program that deals with SOF source files. Their behaviors are not prescribed in this specification, and they MAY intentionally go beyond or against the specification to serve a certain purpose. For instance, tools may decide to introspect source code comments for a variety of purposes, while comments must be ignored by interpreters.

Source Files

An SOF source file is a program or part of a program in the SOF programming language. Source files are plain text files using UTF-8 1 character encoding. Newline sequences consist of an optional carriage return followed by a line feed.

Source files use the standard extension .sof.

Source files adhere to the following Extended Backus-Naur form specification:

(* A program is a series of tokens and comments, where tokens MUST be separated by whitespace. *)
SofProgram = [Token] { [Comments] ?Whitespace? [Comments] Token } { [Comments] ?Whitespace? } ;

Token = "def" | "globaldef" | "dexport" | "use" | "export"
      | "dup" | "pop" | "swap" | "rot" | "over"
      | "write" | "writeln" | "input" | "inputln"
      | "if" | "ifelse" | "while" | "dowhile" | "switch"
      | "function" | "constructor"
      | "+" | "-" | "*" | "/" | "%" | "<<" | ">>" | "cat"
      | "and" | "or" | "xor" | "not"
      | "<" | "<=" | ">" | ">=" | "=" | "/="
      | "." | ":" | "," | ";" | "nativecall"
      | "[" | "]" | "|"
      | "describe" | "describes" | "assert"
      | Number | String | Boolean
      | Identifier | CodeBlock ;

Identifier = ?Unicode Letter? { ?Unicode Letter? | DecimalDigits | "_" | "'" | ":" } ;
(* A code block recursively contains SOF code, i.e. an SofProgram. *)
CodeBlock = "{" SofProgram "}" ;

(* Literals *)
String = '"' { ?any character except "? '\"' } '"' ;
Boolean = "true" | "false" | "True" | "False" ;
Number = [ "+" | "-" ] ( Integer | Decimal ) ;
Integer = "0" ( "h" | "x" ) HexDigits { HexDigits }
        | [ "0d" ] DecimalDigits { DecimalDigits }
	| "0o" OctalDigits { DecimalDigits }
	| "0b" BinaryDigits { BinaryDigits } ;
Decimal = DecimalDigits { DecimalDigits } "." DecimalDigits { DecimalDigits }
          [ ("e" | "E") ( "+" | "-" ) DecimalDigits { DecimalDigits } ] ;
BinaryDigits = "0" | "1" ;
OctalDigits = BinaryDigits | "2" | "3" | "4" | "5" | "6" | "7" ;
DecimalDigits = OctalDigits | "8" | "9" ;
HexDigits = DecimalDigits | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" ;

(* Comments are ignored. *)
Comments = Comment { Comment } ;
Comment = ( "#" { ?any? } ?Unicode line break/newline? )
        | ( "#*" { ?any? } "*#" ) ;

Any violation of this syntax by a program MUST raise a SyntaxError when given as input to an interpreter.

SofProgram is the syntax specification for an entire SOF source file. The source file consists of two types of syntactical constructs: Comments and Tokens.

Comments are purely for the benefit of the programmer and MUST NOT have meaning to an SOF interpreter.

Note

Tokens are the core of the SOF program. The tokens are ordered in a linear sequence. The only exception is the code block token: A code block recursively nests another sequence of tokens. The major other differentiation in the token type is between the Literal Tokens that behave and look like the literals in other programming languages, as well as the Primitive Tokens aka. keywords that execute program logic. The phrase Primitive Token is used to distinguish atomic keywords from the non-atomic { and } keywords.

Program state

For semantic purposes, the program state of a running SOF program consists mainly of two things:

  • A stack of values visible to the program. The stack is a LIFO stack/queue (last in, first out) that can contain any kind of value. Of these values, there are types that the user can place and read via the use of certain tokens, and there are the "hidden types" that are used to make the program execute correctly. Hidden types are specified less precisely, and the user is not generally allowed to interact with them.
  • A stack of token lists that are being executed, with associated information about where these lists come from and what should happen after they finish executing.

Executing

An SOF program consists of a list of tokens. Executing the program consists of running the action of each token in the order it appears in the list of tokens.

Each token may have an effect on the SOF program environment, modifying it to a new state. Tokens may change the execution state of the program, by modifying the token list stack. For instance:

  • add a token list to be executed next
  • stop executing the current token list or any others that are being executed
  • modify the current token list

Tokens may also produce errors, which can then cause further changes to the SOF program environment.

The SOF program exits when:

  • the last token is executed, and the token list stack is empty.
  • an uncaught error occurs.
  • the program is aborted by any other, lower-level means, such as a system call requesting process termination (exit())

Defined Behavior

By default, any behavior in SOF is considered defined. This means that the interpreter MUST NOT deviate from the behavior prescribed in the specification. Some parts of SOF execution are unspecified and leave multiple implementation avenues for interpreters. In these cases, any behavior the interpreter chooses to exhibit MUST be limited to this unspecified part. In particular, any behavior exhibited in unspecified sections MUST NOT propagate to the remainder of the program execution. This means that any further execution MUST (return to and) follow defined SOF execution behavior.

The exception to defined behavior is formed by native functions, invoked through the nativecall operator. As specified, native functions may be supplied by the user and are therefore not part of the interpreter and its guarantees of SOF semantics. The specification for nativecall defines what set of behaviors are allowed for native functions. By virtue of being defined in another programming language, possibly with vastly more flexible behavior than SOF and complex semantics, possibly with far-reaching access to the SOF execution state, the interpreter cannot be expected to verify the correct behavior of all native functions, including catching any incorrect behavior as soon as it happens.

Therefore, if a native function does not uphold the allowed set of behaviors for native functions as laid out in the specification for nativecall, as soon as this function is invoked through use of nativecall, from this point onwards, the interpreter MAY exhibit any possible behavior, including any behavior normally forbidden by this specification. The interpreter MAY never return to defined behavior, as opposed to situations where behavior is locally unspecified. This is called Undefined Behavior. Undefined Behavior MUST NOT result from any other action other than the one described.

1

RFC3629: UTF-8, a transformation format of ISO 10646. https://www.rfc-editor.org/rfc/rfc3629.html