(**************************************************************************************

   Author:   Steffen Jost <jost@informatik.uni-muenchen.de>
   Name:     
   File:     README-LFD.txt,v
   Id:       README-LFD.txt,v 1.1 2003/12/02 14:42:09 a1hloidl Exp 

	
   What is this all about:
   -----------------------
	The program package LFPL_INFER deals with camelot-style programs:
	 - inferring linear upper bounds on heap space consumption of each function 
	 - sandboxed execution for monitoring heap space consumption in sample run


   Usage & Requirements:	
   ---------------------	
	The collection of files should be compiled using the provided Makefile.
	This requires that the OCaml-Compiler 3.06 (http://caml.inria.fr/ocaml/)
	is accessible from the current path. 

	For better performance one may use the ocaml native-code compiler 
	ocamlopt (if available) via "make opt".
	
	The usage of the produced binary is explained when calling it 
	with option '-?'.
	
	The inference constructs a linear program (LP), whose solutions
	yield an annotated typing which allows conclusions about the
	heap space usuage of the program 
	(see  
	      M.Hofmann and S.Jost, 
  	      "Static Prediction of Heap Space Usage for First-Order Functional Programs",
	      In Principles of Programming Languages (POPL) 2003, p.185-197, 
	      ACM Press 2003
	)
	
	By default the inference calls the LP-solver "lp_solve" version 4.0.1.0
	( ftp://ftp.es.ele.tue.nl/pub/lp_solve ,  LesserGNU public license),
	which is assumed to be accessible from the current path. 
	This may be prohibited by using the option '-nlp'. 

	In any way, the derived LP is written to a file in the current path and may be 
	solved by any other means, although in this case there is currently no 
	implemented way to feed the soluton back to lfpl_infer in order to obtain
	a neat print-out in terms of the programs enriched signature.


   Performace and example usage:
   -----------------------------

> time lfd_infer ins_sort.lfd

 Program 'ins_sort.lfd' parsed.

 Resource constraints constructed: 113 inequalties in 110 variables. Written to 'ins_sort.constraints'.

 Solution from lp_solve yields the following enriched types:
  insert        : <2>, Int -> ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>;
  main          : <24>, String -> Unit, <0>;
  print_ilist   : <0>, ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>;
  print_ilist'  : <0>, ilist(<0>|Int,$,<0>) -> Unit, <0>;
  sort          : <0>, ilist(<0>|Int,$,<0>) -> ilist(<0>|Int,$,<0>), <0>;
  sort'         : <0>, ilist(<0>|Int,$,<2>) -> ilist(<0>|Int,$,<0>), <0>;

real    0m0.107s
user    0m0.040s
sys     0m0.000s
>

	The program ins_sort.lfd consists of 6 functions. The printout tells us the following:

	"insert" is function taking as argument an integer and an integer list and returns an integer list.
	In order to execute it properly, there must be at least 2 heap-cells available.

	"main" is a function from string to integer. To execute it, we need at least 24 heap-cells available.
	Note that this means the program will never allocated more than 24 heap-cells at any given time
	during the enitre execution, as this is in deed the main-function of the program.

	"sort" this function, taking an integer list and returning one, does not allocate any extra heap-cells,
	as all annotations are zero. However, it may modify the contents of the heap-cells reachable from its input,
 	it may even dispose/leak some heap cells (but this is not the case here. See tpo.lfd for an example where
	a memory leak occurs, and see how you may be able to spot this in the occuring annotations).

	"sort'" is similar to sort, but it requires that for each "Cons"-Constructor (the second constructor of type ilist)
	of its argument list there are at least two heap-cells available. 

	While "sort" sorts its argument in-place, "sort'" produces a sorted copy of its argument, preserving it. 
	This clearly costs heap-space. and according to the commented-values (see note below):
	  type ilist = Nil(*0*) | Cons(*2*) of int * ilist 
	a Nil-Constructor consumes 0 heap-cells, while a Cons-Constructor consumes 2. Since we produce
	a result with equal size to the input, we just need this extra space in the input (0,2)
        If we would set
  	  type ilist = Nil(*42*) | Cons(*7*) of int * ilist 
	we would obtain
	  sort'         : <0>, ilist(<7>|Int,$,<42>) -> ilist(<0>|Int,$,<0>), <0>;
	If in additon, sort' would produce a list of double length (by whatever means),
	we would obtain
	  sort'         : <0>, ilist(<7>|Int,$,<84>) -> ilist(<0>|Int,$,<0>), <0>;

	Please not that option -uni overrides all those comments and sets each constructor to occupy exactly one heap cell.
	The following example, when called with argument -uni would produce
          main          : <13>, String -> list(<0>|Int,$,<0>), <0>;	
	instead, as the printout after sandboxing (-ex,-main,-arg) would confirm:

> time lfd_infer big100.lfd -ex -arg "2"

 Program 'big100.lfd' parsed.

 Resource constraints constructed in 0.06 seconds: 2117 inequalties in 1408 variables. Written to 'big100.constraints'.

 Solution found by lp_solve in 1.11 seconds.
 Solution from lp_solve yields the following enriched types:
  id_000        : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>;
  id_001        : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>;
   ...
  id_099        : <0>, list(<0>|Int,$,<0>) -> list(<0>|Int,$,<0>), <0>;
  main          : <24>, String -> list(<0>|Int,$,<0>), <0>;


 Executing 'main' with Argumentlist: [2]
 Program output follows:

--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------

 Result of program evaluation:
  Cons(2, Cons(1, Nil()))

 Heap-consumption in terms of
  * Uniform sized heap cells (one cell per constructor):
     - Number of heap store operations      : 317
     - Number of heap read operations       : 307
     - Number of heap remove operations     : 304
     - Maximum heap size during evaluation: 13
     - Current heap size after evaluation : 13

  * Variable-sized heap cells (one cell per constructor argument):
     - Maximum heap size during evaluation: 24
     - Current heap size after evaluation : 24

  * Heap cells according to user defined constructor sizes:
     - Maximum heap size during evaluation: 24
     - Current heap size after evaluation : 24


real    0m5.567s
user    0m2.040s
sys     0m0.010s
>


	These outputs were produced on a PentiumIII, 500Mhz, 256MB, 1000Bogomips under Suse Linux 8.2.
	Performance may be still be improved. See constraint.ml for further details on Performance-ToDos.

	Please try the other provided examples as well and use them with different options, 
	like -odin, -nshr, -int and see what happens...


   Syntax of ".lfd"-files:
   -----------------------
	The recognised syntax of the input programs roughly follows
	the camelot syntax, but:

	 - is restricted to first-order

         - is restricted to monomorphic types

         - does not allow arrays, hence "main" (which in fact may have any name)
	   must be of type [string -> t].
	   (Desirable would be string list -> t, but lists are currently
            not built-in.)
	   In fact, the type of "main" does only matter when sandboxing,
           as the inference treats all functions as equal. 
	   When sandboxing, the commandline string is just passed as a whole string,
	   i.e. > lfd_infer prog.lfd -arg "a b c d e f" 
           passes the string "a b c d e f" to the main-function.
	
         - is restricted to fully sequentialized programs, i.e. 
	   all subexpressions are values (including operators on values),
	   hence let-constructs must be inroduced where this would not be the case,
           giving away the order of execution of each subexpression

	 - it is assumed that the input program is type correct,
	   therefore some parse/type-errors might not be noticed
	   and parse/type-error messages are quite uneloquent

	 - no hiding of variables occurs,
	   i.e. "let x = 1 in let x = 0 in ..." and similar code is prohibited

	...so usually programs printed by the camelot compiler using option '-a2'
	_should_ be fine in general, although there are still some further 
	minor differences which shall be solved soon, see below. 
	The default extension for these programs is '.lfd'.
	

	Further minor differences to the camelot syntax include:

	 - type definitions may carry additonal size-parameters for each constructor,
	   which override the amount of heap-cells each element of that constructor
	   physically occupies in the heap 
	   (e.g. "type ilist = Nil(*10*) | Cons(*5*) of int * ilist" indicates
             that each Cons-cell requires 5 heap-cells, while Nil occupies 10.)
 	   (by default, each constructor occupies a number of cells equal 
	     to the number of its arguments, see also option '-uni')
	   As these parameters are enclosed in '(*' '*)', the camelot
	   compiler should ignore them without problems.
           Note that lfd does not recognise the ! to denote heap-free constructors. 
	   Use (*0*) instead.

	 - some polymorphic built-in printing functions 
	   (print : A -> A,  print' : A -> unit,  print_heap_s : unit,  ...)

	 - Underscores "_" as wildcard identifiers are only allowed
	   as let-varibales ("let _ = ... in ..."), but not in pattern-matches.
	   However, identifiers may contain underscores at any place, so you may
	   write "match ... | cons(_#1,_#2) => ...".
	   Recall that all identifiers must be unique as written above!

	 - lesser relaxations concerning mere notational convenience
	   (eg. "->" instead of "=>").

           As a rule of thumb, enclosing subexpressions in parenthesis "("")"
           should solve most parse errors (especially ELSE- and MATCHRULE-subexpressions.

	Please take a look at the provided examples!
 

   Further notes:
   -------------- 

	- STRINGS: strings are not accounted for in the heap-space inference, although they
   	  certainly use up heap-space. They are merely a convenient way for some onscreen-printing,
 	  but nothing else. If you seriuosly want to use strings, then you have to implement them
          as lists of characters. 
	  So strings may be abused to cheat the inference. This is not a bug. 
	  This is as it is intended to be: The inference should be as plain and
 	  simple as possible. We know that lists are handled well. So there's no reason to 
	  reimplement it on the special case of strings just to gain some convenience for the programmer. 
	  If this convenience is desired, built it into the 'precompiler' which is necessary anyway 
	  for type-checking, monomoprhisation, etc. (we currently abuse Camelot for this purpose).
	  This is also the reason why there is no built-in (polymorphic) list type.
     

   Overview over the provided files:
   ---------------------------------
	 - Makefile	: configuration for the GNU make utility (User-Goals: default, opt, clean)
	 - argument.ml	: command line parsing
	 - common.ml	: my personal extension to the standard libraries
	 - constraint.ml: generates the LP and feeds it to a solver
	 - exec.ml	: the sandbox for demo execution of .lfd files
 	 - lexer.mll	: rules for generating the lexer via ocamllex 	
	 - main.ml	: glues everything together
	 - parser.mly	: rules for generating the parser via ocamlyacc
	 - support.ml   : traces error messages back to .lfd source file
	 - syntax.ml	: the abstract syntax of .lfd
	furthermore a couple of .lfd-example files are provided.


   Acknowledgements:
   -----------------
	Most of the present code was written by 
		Steffen Jost <jost@informatik.uni-muenchen.de>.
	The modules responsible for lexing, parsing and execution are
	based upon an earlier LFPL prototype implementation written by 
	        David Aspinall	<da@dcs.ed.ac.uk> and
		Martin Hofmann	<mhofmann@informatik.uni-muenchen.de> 
	which was used as a template and which itself used code written by
		Benjamin Pierce  <www.cis.upenn.edu/~bcpierce>
	for error handling. Each file bears a notice of its original author(s).

	Of course, the code could not have been written without the above
	cited underlying scientific work, but in addition the implemenation
	benefitted further from discussion with
		Martin Hofmann	<mhofmann@informatik.uni-muenchen.de> 	

	Also thanks to 
		Klaus Aehlig  <aehlig@mathematik.uni-muenchen.de>
	for explaining the principles of Unix' piping and plumbing. 	

	Furthermore thanks 
		Kenneth MacKenzie <kwxm@dcs.ed.ac.uk>
	and 
		Hans Wolfgang Loidl <hwloidl@tcs.informatik.uni-muenchen.de>
	for helping me adjust my implementations to the Camelot language.

		
	It has already been noted that the implementation was written using
	the 
		Ocaml compiler  <http://caml.inria.fr/ocaml>,
	its libraries and even more important, its manual and 
	that it furthermore makes use of the GPL software package
		lp_solve  <ftp://ftp.es.ele.tue.nl/pub/lp_solve>.
		

   ToDos: 
   ------	

	- Look at the 'ToDo'-Information contained in each file!

        - The programs print outs of elapsed time during inference and/or
	  evaluation seems not be measured in seconds, although it should 
	  be according to the OCaml manual. Take a closer at this sometime. 

	- Implement an interface for the abstract syntax from Camelot/Michals inference

	- Purge all files from german comments.


   NotToDos (in order to remember what has already been considered):
   --------
 
        - Nothing here yet.

**************************************************************************************)