Next Previous Contents

3. Input Syntax

This section describes the syntax of the input language to asdlGen. The syntax is described using EBNF notation. Literal terminals are enclosed by double quotes. Optional terms are enclosed in "[]"'s. Terms which are repeated zero or more times are enclosed in "{}"'s. The description is broken up in smaller subsection. Each subsection describes some fragment of the syntax and its meaning.

3.1 Lexical Tokens


upper
= "A" | ... | "Z"
lower = "a" | ... | "z"
alpha = "_" | upper | lower
alpha-num = alpha | "0"| ... | "9"
typ-id = lower {alpha-num}
con-id = upper {alpha-num}
id = typ-id | con-id

Notice that identifiers for types, typ-id, are in a different lexical class from identifiers for constructors con-id. Constructor identifiers must start with one uppercase letter, while type identifiers must begin with one lowercase letter. All identifiers must begin with a letter. Comments (not show in the above syntax) begin with "--" and continue to the end of the line.

3.2 Module Syntax


module
= "module" id [ imports ]"{" { definitions } "}"
imports = "(" { "imports" id } ")"

An ASDL module declaration consists of the keyword "module" followed by an identifier, an optional set of imported modules, and a sequence of type definitions enclosed in braces. For example the following example declares modules A, B, and C. B imports types from A. C imports types from both A and B. Imports cannot be recursive. It is an error for B to import C, since C also imports B.

 module A { ... } 
 module B (imports A) { ... }
 module C (imports A 
           imports B) { ... }

To refer to a type imported from another module the type must always be qualified by the module name from which is imported. The following declares two different types called "t". One in the A module and one in the B module. The type "t" in the B module defines a type "t" that recursively mentions itself and the type "t" imported from the A module.

module A { t = ... } 
module B (imports A) { t = T(A.t, t) | N  ... }

3.3 Type Definitions


definitions
= {typ-id "=" type}
type = sum-type | product-type
product-type = fields
sum-type = constructor {"|" constructor}
["attributes" fields]
constructor = con-id [fields]
fields = "(" { field "," } field ")"
field = [id "."]typ-id ["?"| "*"] [id]

All type definitions occur within a module. They begin with a type identifier which is the name of the type. The name must be unique within the module. The order of definitions is unimportant. When translating type definitions from a module they are placed in what would be considered a module, package, or name-space of the same name. If a output language does not support such features and only has one global name space the module name is used to prefix all the globally exported identifiers.

Type definitions are either product types which are simple record definitions or sum type which represent a discriminated union of possible values. Unlike sum types, product types cannot form recursive type definitions, but they can contain recursively declared sum types.

Primitve Types

There are only three primitive types in ASDL. They are string, int, and identifier. The string type represents length encoded strings of bytes. The int type represents arbitrary precision signed integers. The identifier type represents atomic printable names, which support fast equality testing, analogous to LISP symbols.

Product Types

Product types are record or tuple declarations. They consist of a non-empty sequence of fields separated by commas enclosed in parenthesis. For example

 
pair_of_ints = (int, int) 
declares a new type pair_of_ints that consists of two integers.

Sum Types

Sum types are the most useful types in ASDL. They provide concise notation used to describe a type that is the tagged union of a finite set of other types. Sum types consists of a series of constructors separated by a vertical bar. Each constructor consist of a constructor identifier followed by an optional sequence of fields similar to a product type.

Constructor names must be unique within the module in which they are declared. Constructors can be viewed as functions who take some number of arguments of arbitrary type and create a value belonging to the sum type in which they are declared. For example

module M {
  sexpr = Int(int)
        | String(string)
        | Symbol(identifier)
        | Cons(sexpr, sexpr)
        | Nil
}
declares that values of type sexpr can either be constructed from an int using the Int constructor or a string from a String constructor, an identifier using the Symbol constructor, or from two other sexpr using the Cons constructor, or from no arguments using the Nil constructor. Notice that the Cons constructor recursively refers to the sexpr type. ASDL allows sum types to be mutually recursive. Recursion however is limited to sum types defined within the same module.

The Rosetta Stone for Sum Types

In languages that have algebraic data types sum types are equivalent to the datatype and data declarations in ML and Haskell. In Algol like languages they are equivalent to tagged unions in C or variant records in Pascal. In class based object oriented languages sum types are equivalent to an abstract base class that represents the type of a family of subclasses one for each constructor of the type. The previous example written in ML would be

structure M =
 struct
  datatype sexpr = 
           Int(int)
         | String(string)
         | Symbol(identifier)
         | Cons(sexpr * sexpr)
         | Nil
 end
in C would be
typedef struct M_sexpr_s * M_sexpr_ty;
struct M_sexpr_s { 
 enum { M_Int_kind, M_String_kind, 
        M_Symbol_kind, M_Cons_kind, M_Nil_kind }  kind;
         union {
          struct M_Int_s    { int_ty  int1;             } M_Int;
          struct M_String_s { string_ty string1;        } M_String;      
          struct M_Atom_s   { identifier_ty identifier1;} M_Atom;
          struct M_Cons_s   { M_sexpr_ty; sexpr1 M_sexpr_ty sexpr2 }; M_Cons;
         } v;
}
and in Java would be
package ast.M;
public abstract class sexpr {
         int kind;
         static final Int_kind        = 0;
         static final String_kind     = 1;
         static final Symbol_kind     = 2;
         static final Cons_kind       = 3;
         static final Nil_kind        = 4;
         abstract int kind();
}
public class Int extends sexpr { 
    int v; 
    Int(int v) { this.v = v;  } 
    int kind () { return Int_kind; }
}
public class String extends sexpr { String(string v) { ... } ... }
public class Symbol extends sexpr { Symbol(identifier v) { ... } ... }
public class Cons   extends sexpr { Cons(sexpr x, sexpr y) { ... } }
public class Nil    extends sexpr { Nil() { ... } }

Sum Types as Enumerations

Sum types which consist completely of constructors that take no arguments are often treated specially and translated into static constants of a enumerated value in languages that support them.

module Op {
  op = PLUS | MINUS | TIMES | DIVIDE 
}

Is translated into the enumeration and constants in C as

enum Op_op_ty {Op_PLUS_kind, Op_MINUS_kind, Op_TIME_kind, Op_DIVIDE_kind};
extern Op_op_ty Op_PLUS   = Op_PLUS_kind;
extern Op_op_ty Op_MINUS  = Op_MINUS_kind;
extern Op_op_ty Op_TIMES  = Op_TIMES_kind;
extern Op_op_ty Op_DIVIDE = Op_DIVIDE_kind;

Field Labels

Field declarations can be as simple as a single type identifier, or they can be followed with an optional label. Labels aid in the readability of descriptions and are used by asdlGen to name the fields of records and classes for languages. For example the declarations

module Tree {
  tree = Node(int, tree, tree)
       | EmptyTree
}
can also be written as
module Tree {
  tree = Node(int value, tree left, tree right)
       | EmptyTree
}

When translating the first definition without labels into C one would normally get

...
struct Tree_tree_s {
    union {
    struct { int_ty int1; Tree_tree_ty tree1; Tree_tree_ty tree2; } ...
    } v
}
...
with labels one would get
...
struct Tree_tree_s {
    union {
    struct { int_ty value; Tree_tree_ty left; Tree_tree_ty right; } ...
    } v
}
...

Type Qualifiers

The type identifier of a field declaration can also be qualified with either a sequence ("*") or option ("?") qualifier. The sequence qualifier is an abbreviation that stands for a sequence or list of that given type, while the option qualifier stands for a type whose value maybe uninitialized. Sequence types are equivalent to the lists or arrays of a fixed type. Option types are equivalent to the option and Maybe types in ML and Haskell or the idiom of a pointer which maybe NULL or contain a valid address or object reference in languages like C or Java. asdlGen provides various different translation schemes for handling option and sequence types in languages that do not have parametric polymorphism. See section Invocation for details.

Attributes

All sum types may optionally be followed by a set of attributes. Attributes provide notation for fields common to all the constructors in the sum type.

module M {
  pos = (string file, int linenum, int charpos)
  sexpr = Int(int)
        | String(string)
        | Symbol(identifier)
        | Cons(sexpr, sexpr)
        | Nil
        attribute(pos p)
}
adds a field of type pos with label p to all the constructors in sexpr. Attribute fields are treated specially when translated. For example in C code the attribute field is hoisted out of the union and placed in the outer struct. Object oriented languages find attribute fields declared in the abstract base class.

3.4 View Syntax


lit-txt
= see discussion below
view = "view" id "{" {view-entry} "}"
view-entity = id "." typ-id
| id "." con-id
| "module" id
view-prop = "<=" id
view-entry = view-entity view-prop lit-txt
| "{" {view-entity} "}"view-prop lit-txt
| view-prop"{" {view-entity lit-txt} "}"
| view-entity "<=""{" {id lit-txt} "}"

Views are an extensible system of annotating modules, types, and constructors to aid in the conversion of ASDL declarations into a particular output language. This discussion will only cover the basic view syntax. Section Views discusses the meaning of the annotations in more detail.

Basic Syntax

Views are named and consist of series of entries. Each entry conceptually begins with a fully qualified type, constructor, or module name. Following that name is a key value pair. The meaning of the entry is to add a particular key value pair into an internal table associated with the type, constructor, or module. The type, constructor or module is referred to as the view entity the key is referred to as the view property. The value is an arbitrary string. In the grammar above the arbitrary string is represented with the lit-txt non-terminal. lit-txt is either a ":" followed by some arbitrary text that continues to the end of the line or a "%%" balanced by a "%%" at the beginning of a line by itself. Text include using the ":" notation will have trailing and leading whitespace removed. For example

view Doc {
  module  M <= doc_string
%%
  Types for representing LISP s-expressions.
%%
  M.sexpr  <= doc_string : s-expressions 
  M.Int    <= doc_string : s-expression constructor
  M.Symbol <= doc_string : s-expression constructor
  M.Cons   <= doc_string : s-expression constructor
  M.Nil    <= doc_string : s-expression constructor
}

view Java {
 M.sexpr <= source_name : Sexpr
 M.sexpr <= base_class  : MyClass
}

associates with the module M the type M.sexpr and the constructor M.Int strings that will be added to the automatically generated documentation produced by the --doc command of asdlGen. ( In future we'll probably dump them in comments in the output code too.) The view named Java causes the type M.sexpr to be renamed Sexpr when generating Java output, and causes the abstract class normally generated to inherit from MyClass.

There can be many views with the same name. The entries of two views with the same name are merged and consists of the union of the entries in both. It is an error, for two views of the same name to assign different values to the same property of an entity. Currently this error isn't checked for so you randomly get one or the other.

Sugared Syntax

The above example show only the simplest syntax to define view entry there are three sugared versions that remove some of the redundancy of the simple syntax. The first sugared version allows the assignment of the same property value pair to a set of entities. The next sugared version allows assigning to different entities different values for a fixed property. The final sugared version allows the assignment of different property value pairs to the same entity. Examples of the sugared notation are shown below in their respective order.

view Doc {
 { M.Int  M.Symbol  
   M.Cons M.Nil } <= doc_string : s-expression constructor

 <= doc_string {
  module  M 
%%
  Types for representing LISP s-expressions.
%%
  M.sexpr : s-expressions 
  }
}

view Java {
 M.sexpr <= {
   source_name : Sexpr
   base_class  : MyClass
 }
}


Next Previous Contents