(************************************************************************************** Author: Steffen Jost Name: File: README-LFD.txt,v Id: README-LFD.txt,v 1.1 2003/12/02 14:42:09 a1hloidl Exp What is this all about: ----------------------- The program package LFPL_INFER deals with camelot-style programs: - inferring linear upper bounds on heap space consumption of each function - sandboxed execution for monitoring heap space consumption in sample run Usage & Requirements: --------------------- The collection of files should be compiled using the provided Makefile. This requires that the OCaml-Compiler 3.06 (http://caml.inria.fr/ocaml/) is accessible from the current path. For better performance one may use the ocaml native-code compiler ocamlopt (if available) via "make opt". The usage of the produced binary is explained when calling it with option '-?'. The inference constructs a linear program (LP), whose solutions yield an annotated typing which allows conclusions about the heap space usuage of the program (see M.Hofmann and S.Jost, "Static Prediction of Heap Space Usage for First-Order Functional Programs", In Principles of Programming Languages (POPL) 2003, p.185-197, ACM Press 2003 ) By default the inference calls the LP-solver "lp_solve" version 4.0.1.0 ( ftp://ftp.es.ele.tue.nl/pub/lp_solve , LesserGNU public license), which is assumed to be accessible from the current path. This may be prohibited by using the option '-nlp'. In any way, the derived LP is written to a file in the current path and may be solved by any other means, although in this case there is currently no implemented way to feed the soluton back to lfpl_infer in order to obtain a neat print-out in terms of the programs enriched signature. Performace and example usage: ----------------------------- > time lfd_infer ins_sort.lfd Program 'ins_sort.lfd' parsed. Resource constraints constructed: 113 inequalties in 110 variables. Written to 'ins_sort.constraints'. Solution from lp_solve yields the following enriched types: insert : <2>, Int -> ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>; main : <24>, String -> Unit, <0>; print_ilist : <0>, ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>; print_ilist' : <0>, ilist(<0>|Int,$,<0>) -> Unit, <0>; sort : <0>, ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>; sort' : <0>, ilist(<0>|Int,$,<2>) -> ilist(<0>|Int,$,<0>), <0>; real 0m0.107s user 0m0.040s sys 0m0.000s > The program ins_sort.lfd consists of 6 functions. The printout tells us the following: "insert" is function taking as argument an integer and an integer list and returns an integer list. In order to execute it properly, there must be at least 2 heap-cells available. "main" is a function from string to integer. To execute it, we need at least 24 heap-cells available. Note that this means the program will never allocated more than 24 heap-cells at any given time during the enitre execution, as this is in deed the main-function of the program. "sort" this function, taking an integer list and returning one, does not allocate any extra heap-cells, as all annotations are zero. However, it may modify the contents of the heap-cells reachable from its input, it may even dispose/leak some heap cells (but this is not the case here. See tpo.lfd for an example where a memory leak occurs, and see how you may be able to spot this in the occuring annotations). "sort'" is similar to sort, but it requires that for each "Cons"-Constructor (the second constructor of type ilist) of its argument list there are at least two heap-cells available. While "sort" sorts its argument in-place, "sort'" produces a sorted copy of its argument, preserving it. This clearly costs heap-space. and according to the commented-values (see note below): type ilist = Nil(*0*) | Cons(*2*) of int * ilist a Nil-Constructor consumes 0 heap-cells, while a Cons-Constructor consumes 2. Since we produce a result with equal size to the input, we just need this extra space in the input (0,2) If we would set type ilist = Nil(*42*) | Cons(*7*) of int * ilist we would obtain sort' : <0>, ilist(<7>|Int,$,<42>) -> ilist(<0>|Int,$,<0>), <0>; If in additon, sort' would produce a list of double length (by whatever means), we would obtain sort' : <0>, ilist(<7>|Int,$,<84>) -> ilist(<0>|Int,$,<0>), <0>; Please not that option -uni overrides all those comments and sets each constructor to occupy exactly one heap cell. The following example, when called with argument -uni would produce main : <13>, String -> list(<0>|Int,$,<0>), <0>; instead, as the printout after sandboxing (-ex,-main,-arg) would confirm: > time lfd_infer big100.lfd -ex -arg "2" Program 'big100.lfd' parsed. Resource constraints constructed in 0.06 seconds: 2117 inequalties in 1408 variables. Written to 'big100.constraints'. Solution found by lp_solve in 1.11 seconds. Solution from lp_solve yields the following enriched types: id_000 : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>; id_001 : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>; ... id_099 : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>; main : <24>, String -> list(<0>|Int,$,<0>), <0>; Executing 'main' with Argumentlist: [2] Program output follows: -------------------------------------------------------------------------------- ... -------------------------------------------------------------------------------- Result of program evaluation: Cons(2, Cons(1, Nil())) Heap-consumption in terms of * Uniform sized heap cells (one cell per constructor): - Number of heap store operations : 317 - Number of heap read operations : 307 - Number of heap remove operations : 304 - Maximum heap size during evaluation: 13 - Current heap size after evaluation : 13 * Variable-sized heap cells (one cell per constructor argument): - Maximum heap size during evaluation: 24 - Current heap size after evaluation : 24 * Heap cells according to user defined constructor sizes: - Maximum heap size during evaluation: 24 - Current heap size after evaluation : 24 real 0m5.567s user 0m2.040s sys 0m0.010s > These outputs were produced on a PentiumIII, 500Mhz, 256MB, 1000Bogomips under Suse Linux 8.2. Performance may be still be improved. See constraint.ml for further details on Performance-ToDos. Please try the other provided examples as well and use them with different options, like -odin, -nshr, -int and see what happens... Syntax of ".lfd"-files: ----------------------- The recognised syntax of the input programs roughly follows the camelot syntax, but: - is restricted to first-order - is restricted to monomorphic types - does not allow arrays, hence "main" (which in fact may have any name) must be of type [string -> t]. (Desirable would be string list -> t, but lists are currently not built-in.) In fact, the type of "main" does only matter when sandboxing, as the inference treats all functions as equal. When sandboxing, the commandline string is just passed as a whole string, i.e. > lfd_infer prog.lfd -arg "a b c d e f" passes the string "a b c d e f" to the main-function. - is restricted to fully sequentialized programs, i.e. all subexpressions are values (including operators on values), hence let-constructs must be inroduced where this would not be the case, giving away the order of execution of each subexpression - it is assumed that the input program is type correct, therefore some parse/type-errors might not be noticed and parse/type-error messages are quite uneloquent - no hiding of variables occurs, i.e. "let x = 1 in let x = 0 in ..." and similar code is prohibited ...so usually programs printed by the camelot compiler using option '-a2' _should_ be fine in general, although there are still some further minor differences which shall be solved soon, see below. The default extension for these programs is '.lfd'. Further minor differences to the camelot syntax include: - type definitions may carry additonal size-parameters for each constructor, which override the amount of heap-cells each element of that constructor physically occupies in the heap (e.g. "type ilist = Nil(*10*) | Cons(*5*) of int * ilist" indicates that each Cons-cell requires 5 heap-cells, while Nil occupies 10.) (by default, each constructor occupies a number of cells equal to the number of its arguments, see also option '-uni') As these parameters are enclosed in '(*' '*)', the camelot compiler should ignore them without problems. Note that lfd does not recognise the ! to denote heap-free constructors. Use (*0*) instead. - some polymorphic built-in printing functions (print : A -> A, print' : A -> unit, print_heap_s : unit, ...) - Underscores "_" as wildcard identifiers are only allowed as let-varibales ("let _ = ... in ..."), but not in pattern-matches. However, identifiers may contain underscores at any place, so you may write "match ... | cons(_#1,_#2) => ...". Recall that all identifiers must be unique as written above! - lesser relaxations concerning mere notational convenience (eg. "->" instead of "=>"). As a rule of thumb, enclosing subexpressions in parenthesis "("")" should solve most parse errors (especially ELSE- and MATCHRULE-subexpressions. Please take a look at the provided examples! Further notes: -------------- - STRINGS: strings are not accounted for in the heap-space inference, although they certainly use up heap-space. They are merely a convenient way for some onscreen-printing, but nothing else. If you seriuosly want to use strings, then you have to implement them as lists of characters. So strings may be abused to cheat the inference. This is not a bug. This is as it is intended to be: The inference should be as plain and simple as possible. We know that lists are handled well. So there's no reason to reimplement it on the special case of strings just to gain some convenience for the programmer. If this convenience is desired, built it into the 'precompiler' which is necessary anyway for type-checking, monomoprhisation, etc. (we currently abuse Camelot for this purpose). This is also the reason why there is no built-in (polymorphic) list type. Overview over the provided files: --------------------------------- - Makefile : configuration for the GNU make utility (User-Goals: default, opt, clean) - argument.ml : command line parsing - common.ml : my personal extension to the standard libraries - constraint.ml: generates the LP and feeds it to a solver - exec.ml : the sandbox for demo execution of .lfd files - lexer.mll : rules for generating the lexer via ocamllex - main.ml : glues everything together - parser.mly : rules for generating the parser via ocamlyacc - support.ml : traces error messages back to .lfd source file - syntax.ml : the abstract syntax of .lfd furthermore a couple of .lfd-example files are provided. Acknowledgements: ----------------- Most of the present code was written by Steffen Jost . The modules responsible for lexing, parsing and execution are based upon an earlier LFPL prototype implementation written by David Aspinall and Martin Hofmann which was used as a template and which itself used code written by Benjamin Pierce for error handling. Each file bears a notice of its original author(s). Of course, the code could not have been written without the above cited underlying scientific work, but in addition the implemenation benefitted further from discussion with Martin Hofmann Also thanks to Klaus Aehlig for explaining the principles of Unix' piping and plumbing. Furthermore thanks Kenneth MacKenzie and Hans Wolfgang Loidl for helping me adjust my implementations to the Camelot language. It has already been noted that the implementation was written using the Ocaml compiler , its libraries and even more important, its manual and that it furthermore makes use of the GPL software package lp_solve . ToDos: ------ - Look at the 'ToDo'-Information contained in each file! - The programs print outs of elapsed time during inference and/or evaluation seems not be measured in seconds, although it should be according to the OCaml manual. Take a closer at this sometime. - Implement an interface for the abstract syntax from Camelot/Michals inference - Purge all files from german comments. NotToDos (in order to remember what has already been considered): -------- - Nothing here yet. **************************************************************************************)