This section describes the syntax of the input language to asdlGen
. The
syntax is described using EBNF notation. Literal terminals are enclosed by
double quotes. Optional terms are enclosed in "[]
"'s. Terms which are
repeated zero or more times are enclosed in "{}
"'s. The description is
broken up in smaller subsection. Each subsection describes some fragment of
the syntax and its meaning.
upper | = | "A" | ... | "Z" |
lower | = | "a" | ... | "z" |
alpha | = | "_" | upper | lower |
alpha-num | = | alpha | "0"| ... | "9" |
typ-id | = | lower {alpha-num} |
con-id | = | upper {alpha-num} |
id | = | typ-id | con-id |
Notice that identifiers for types, typ-id, are in a different
lexical class from identifiers for constructors
con-id. Constructor identifiers must start with one uppercase
letter, while type identifiers must begin with one lowercase
letter. All identifiers must begin with a letter. Comments (not show
in the above syntax) begin with "--
" and continue to the end of
the line.
module | = | "module" id [ imports ]"{" { definitions } "}" |
imports | = | "(" { "imports" id } ")" |
An ASDL module declaration consists of the keyword "module"
followed by an identifier, an optional set of imported modules, and a
sequence of type definitions enclosed in braces. For example the
following example declares modules A
, B
, and C
. B
imports types from A
. C
imports types from both A
and
B
. Imports cannot be recursive. It is an error for B
to
import C
, since C
also imports B
.
module A { ... }
module B (imports A) { ... }
module C (imports A
imports B) { ... }
To refer to a type imported from another module the type must
always be qualified by the module name from which is
imported. The following declares two different types called
"t
". One in the A
module and one in the B
module. The
type "t
" in the B
module defines a type "t
" that
recursively mentions itself and the type "t
" imported from the
A
module.
module A { t = ... }
module B (imports A) { t = T(A.t, t) | N ... }
definitions | = | {typ-id "=" type} |
type | = | sum-type | product-type |
product-type | = | fields |
sum-type | = | constructor {"|" constructor} |
["attributes" fields] | ||
constructor | = | con-id [fields] |
fields | = | "(" { field "," } field ")" |
field | = | [id "."]typ-id ["?"| "*"] [id] |
All type definitions occur within a module. They begin with a type identifier which is the name of the type. The name must be unique within the module. The order of definitions is unimportant. When translating type definitions from a module they are placed in what would be considered a module, package, or name-space of the same name. If a output language does not support such features and only has one global name space the module name is used to prefix all the globally exported identifiers.
Type definitions are either product types which are simple record definitions or sum type which represent a discriminated union of possible values. Unlike sum types, product types cannot form recursive type definitions, but they can contain recursively declared sum types.
There are only three primitive types in ASDL. They
are string
, int
, and identifier
. The string
type
represents length encoded strings of bytes. The int
type represents
arbitrary precision signed integers. The identifier
type represents
atomic printable names, which support fast equality testing, analogous to
LISP symbols.
Product types are record or tuple declarations. They consist of a non-empty sequence of fields separated by commas enclosed in parenthesis. For example
pair_of_ints = (int, int)
declares a new type pair_of_ints
that consists of two integers.
Sum types are the most useful types in ASDL. They provide concise notation used to describe a type that is the tagged union of a finite set of other types. Sum types consists of a series of constructors separated by a vertical bar. Each constructor consist of a constructor identifier followed by an optional sequence of fields similar to a product type.
Constructor names must be unique within the module in which they are declared. Constructors can be viewed as functions who take some number of arguments of arbitrary type and create a value belonging to the sum type in which they are declared. For example
module M {
sexpr = Int(int)
| String(string)
| Symbol(identifier)
| Cons(sexpr, sexpr)
| Nil
}
declares that values of type sexpr
can either be constructed from an
int
using the Int
constructor or a string
from a String
constructor, an identifier
using the Symbol
constructor, or from
two other sexpr
using the Cons
constructor, or from no arguments
using the Nil
constructor. Notice that the Cons
constructor
recursively refers to the sexpr
type. ASDL allows sum types to be
mutually recursive. Recursion however is limited to sum types defined within
the same module.
In languages that have algebraic data types sum types are equivalent to the
datatype
and data
declarations in ML and Haskell. In Algol like
languages they are equivalent to tagged unions
in C or variant records
in Pascal. In class based object oriented languages sum types are equivalent
to an abstract base class that represents the type of a family of subclasses
one for each constructor of the type. The previous example written in ML would
be
structure M =
struct
datatype sexpr =
Int(int)
| String(string)
| Symbol(identifier)
| Cons(sexpr * sexpr)
| Nil
end
in C would be
typedef struct M_sexpr_s * M_sexpr_ty;
struct M_sexpr_s {
enum { M_Int_kind, M_String_kind,
M_Symbol_kind, M_Cons_kind, M_Nil_kind } kind;
union {
struct M_Int_s { int_ty int1; } M_Int;
struct M_String_s { string_ty string1; } M_String;
struct M_Atom_s { identifier_ty identifier1;} M_Atom;
struct M_Cons_s { M_sexpr_ty; sexpr1 M_sexpr_ty sexpr2 }; M_Cons;
} v;
}
and in Java would be
package ast.M;
public abstract class sexpr {
int kind;
static final Int_kind = 0;
static final String_kind = 1;
static final Symbol_kind = 2;
static final Cons_kind = 3;
static final Nil_kind = 4;
abstract int kind();
}
public class Int extends sexpr {
int v;
Int(int v) { this.v = v; }
int kind () { return Int_kind; }
}
public class String extends sexpr { String(string v) { ... } ... }
public class Symbol extends sexpr { Symbol(identifier v) { ... } ... }
public class Cons extends sexpr { Cons(sexpr x, sexpr y) { ... } }
public class Nil extends sexpr { Nil() { ... } }
Sum types which consist completely of constructors that take no arguments are often treated specially and translated into static constants of a enumerated value in languages that support them.
module Op {
op = PLUS | MINUS | TIMES | DIVIDE
}
Is translated into the enumeration and constants in C as
enum Op_op_ty {Op_PLUS_kind, Op_MINUS_kind, Op_TIME_kind, Op_DIVIDE_kind};
extern Op_op_ty Op_PLUS = Op_PLUS_kind;
extern Op_op_ty Op_MINUS = Op_MINUS_kind;
extern Op_op_ty Op_TIMES = Op_TIMES_kind;
extern Op_op_ty Op_DIVIDE = Op_DIVIDE_kind;
Field declarations can be as simple as a single type identifier, or they can
be followed with an optional label. Labels aid in the readability of
descriptions and are used by asdlGen
to name the fields of records
and classes for languages. For example the declarations
module Tree {
tree = Node(int, tree, tree)
| EmptyTree
}
can also be written as
module Tree {
tree = Node(int value, tree left, tree right)
| EmptyTree
}
When translating the first definition without labels into C one would normally get
...
struct Tree_tree_s {
union {
struct { int_ty int1; Tree_tree_ty tree1; Tree_tree_ty tree2; } ...
} v
}
...
with labels one would get
...
struct Tree_tree_s {
union {
struct { int_ty value; Tree_tree_ty left; Tree_tree_ty right; } ...
} v
}
...
The type identifier of a field declaration can also be qualified with either
a sequence ("*") or option ("?") qualifier. The sequence qualifier
is an abbreviation that stands for a sequence or list of that given type,
while the option qualifier stands for a type whose value maybe
uninitialized. Sequence types are equivalent to the lists or arrays of a
fixed type. Option types are equivalent to the option
and Maybe
types in ML and Haskell or the idiom of a pointer which maybe NULL
or
contain a valid address or object reference in languages like C or Java.
asdlGen
provides various different translation schemes for handling option
and sequence types in languages that do not have parametric polymorphism.
See section
Invocation for details.
All sum types may optionally be followed by a set of attributes. Attributes provide notation for fields common to all the constructors in the sum type.
module M {
pos = (string file, int linenum, int charpos)
sexpr = Int(int)
| String(string)
| Symbol(identifier)
| Cons(sexpr, sexpr)
| Nil
attribute(pos p)
}
adds a field of type pos
with label p
to all the constructors in
sexpr
. Attribute fields are treated specially when
translated. For example in C code the attribute field is hoisted out of the
union
and placed in the outer struct
. Object oriented languages
find attribute fields declared in the abstract base class.
lit-txt | = | see discussion below |
view | = | "view" id "{" {view-entry} "}" |
view-entity | = | id "." typ-id |
| | id "." con-id | |
| | "module" id | |
view-prop | = | "<=" id |
view-entry | = | view-entity view-prop lit-txt |
| | "{" {view-entity} "}"view-prop lit-txt | |
| | view-prop"{" {view-entity lit-txt} "}" | |
| | view-entity "<=""{" {id lit-txt} "}" | |
Views are an extensible system of annotating modules, types, and constructors to aid in the conversion of ASDL declarations into a particular output language. This discussion will only cover the basic view syntax. Section Views discusses the meaning of the annotations in more detail.
Views are named and consist of series of entries. Each entry conceptually begins with a fully qualified type, constructor, or module name. Following that name is a key value pair. The meaning of the entry is to add a particular key value pair into an internal table associated with the type, constructor, or module. The type, constructor or module is referred to as the view entity the key is referred to as the view property. The value is an arbitrary string. In the grammar above the arbitrary string is represented with the lit-txt non-terminal. lit-txt is either a ":" followed by some arbitrary text that continues to the end of the line or a "%%" balanced by a "%%" at the beginning of a line by itself. Text include using the ":" notation will have trailing and leading whitespace removed. For example
view Doc {
module M <= doc_string
%%
Types for representing LISP s-expressions.
%%
M.sexpr <= doc_string : s-expressions
M.Int <= doc_string : s-expression constructor
M.Symbol <= doc_string : s-expression constructor
M.Cons <= doc_string : s-expression constructor
M.Nil <= doc_string : s-expression constructor
}
view Java {
M.sexpr <= source_name : Sexpr
M.sexpr <= base_class : MyClass
}
associates with the module M the type M.sexpr and the constructor M.Int
strings that will be added to the automatically generated documentation
produced by the --doc
command of asdlGen
. ( In future we'll
probably dump them in comments in the output code too.) The view named
Java
causes the type M.sexpr
to be renamed Sexpr
when
generating Java output, and causes the abstract class normally generated to
inherit from MyClass
.
There can be many views with the same name. The entries of two views with the same name are merged and consists of the union of the entries in both. It is an error, for two views of the same name to assign different values to the same property of an entity. Currently this error isn't checked for so you randomly get one or the other.
The above example show only the simplest syntax to define view entry there are three sugared versions that remove some of the redundancy of the simple syntax. The first sugared version allows the assignment of the same property value pair to a set of entities. The next sugared version allows assigning to different entities different values for a fixed property. The final sugared version allows the assignment of different property value pairs to the same entity. Examples of the sugared notation are shown below in their respective order.
view Doc {
{ M.Int M.Symbol
M.Cons M.Nil } <= doc_string : s-expression constructor
<= doc_string {
module M
%%
Types for representing LISP s-expressions.
%%
M.sexpr : s-expressions
}
}
view Java {
M.sexpr <= {
source_name : Sexpr
base_class : MyClass
}
}