COP 4555 - Homework 5

Due Monday, April 11

Interpreter 1

In this homework, you will develop an SML interpreter for a small functional language called PCF, which stands for Programming language for Computable Functions. The language is relatively simple, but more sophisticated than the arithmetic expressions considered in Homework 4 since it includes functions. The syntax of PCF programs is given by the following BNF grammar:

  e ::= x | n | true | false | succ | pred | iszero |
        if e then e else e | fn x => e | e e | rec x => e | (e)
In the above, x stands for an identifier; n stands for a non-negative integer literal; true and false are the boolean literals; succ and pred are unary functions that add 1 and subtract 1 from their input, respectively; iszero is a unary function that returns true if its argument is 0 and false otherwise; if e1 then e2 else e3 is a conditional expression; fn x => e is a function with parameter x and body e; e e is a function application; rec x => e is used for defining recursive functions (we'll explain this later); and (e) allows parentheses to be used to control grouping.

It should be clear to you that the above grammar is quite ambiguous. For example, should fn f => f f be parsed as fn f => (f f) or as (fn f => f) f? We can resolve such ambiguities by adopting the following conventions (which are the same as in SML):

As in Interpreter 0, we don't want to interpret concrete syntax directly. Instead, the interpreter will work on an abstract syntax tree representation of the program; these abstract syntax trees will be values in the following SML datatype:

  datatype term = AST_ID of string | AST_NUM of int | AST_BOOL of bool
    | AST_SUCC | AST_PRED | AST_ISZERO | AST_IF of term * term * term
    | AST_FUN of string * term | AST_APP of term * term
    | AST_REC of string * term | AST_ERROR of string
As before, this definition mirrors the BNF grammar given above; for instance, the constructor AST_ID makes a string into an identifier, and the constructor AST_FUN makes a string representing the formal parameter and a term representing the body into a function. Note that there is no abstract syntax for (e); the parentheses are just used to control grouping. Also, there is an additional kind of term, AST_ERROR, which will be used for reporting runtime errors.

Recall that in Interpreter 0 you had to build the abstract syntax trees for arithmetic expressions by hand, a quite tedious process. For this assignment, I am providing you with a parser that automatically converts from concrete PCF syntax to an abstract syntax tree. The parser is available here, or you can copy it from ~smithg/parser.sml to your directory. Include the command

  use "parser.sml";
at the beginning of the file containing your interpreter. This defines the datatype term as well as two useful functions, parsestr and parsefile. Function parsestr takes a string and returns the corresponding abstract syntax; for example
  - parsestr "iszero (succ 7)";
Function parsefile takes instead the name of a file and parses its contents. (By the way, the parser is a recursive-descent parser, as discussed in class; you may find it interesting to study how it works.)

You are to write an SML function interp that takes an abstract syntax tree represented as a term and returns the result of evaluating it, which will also be a term. The evaluation should be done according to the rules given below. (Rules in this style are known in the research literature as a natural semantics.) The rules are based on judgments of the form e => v, which means that term e evaluates to value v (and then can be evaluated no further). For the sake of readability, we describe the rules below using the concrete syntax of PCF programs; remember that your interp program will actually need to work on abstract syntax trees, which are SML values of type term.

The first few rules are uninteresting; they just say that basic PCF values evaluate to themselves:

(1) n => n, for any non-negative integer literal n

(2) true => true and false => false

(3) error s => error s

(4) succ => succ, pred => pred, and iszero => iszero.

The interesting evaluation rules are a bit more complicated, because they involve hypotheses as well as a conclusion. For example, here's one of the rules for evaluating an if-then-else:

         b => true         e1 => v
 (5)	---------------------------
         if b then e1 else e2 => v
In such a rule, the judgments above the horizontal line are hypotheses and the judgment below is the conclusion. We read the rule from the bottom up: "if the expression is an if-then-else with components b, e1, and e2, and b evaluates to true and e1 evaluates to v, then the entire expression evaluates to v". Of course, we also have the symmetric rule
         b => false        e2 => v
 (6)    ----------------------------
         if b then e1 else e2 => v
The following rules define the behavior of the built-in functions:
         e1 => succ        e2 => n
 (7)    ----------------------------
             e1 e2 => n+1

         e1 => pred        e2 => 0       e1 => pred   e2 => n+1  
 (8)    ---------------------------     --------------------------
             e1 e2 => 0                         e1 e2 => n
         e1 => iszero   e2 => 0          e1 => iszero   e2 => n+1
 (9)	------------------------        ---------------------------
              e1 e2 => true                   e1 e2 => false
(In these rules, n stands for a non-negative integer.)

For example, to evaluate

  if iszero 0 then 1 else 2
we must, by rules (5) and (6), first evaluate iszero 0. By rule (9) (and rules (4) and (1)), this evaluates to true. Finally, by rule (5) (and rule (1)), the whole program evalutes to 1.

Part a. As a first step, use these rules to write an interpreter, interp: term -> term, for the subset of the language that does not include terms of the form AST_ID, AST_FUN, or AST_REC. If your interpreter is given such a term, it can return an error term, such as AST_ERROR "not yet implemented". (You also will need to return error terms for type errors like succ true.)

Part b. Interpreters need a way of passing parameters to user-defined functions; here we will accomplish this by means of textual substitution. In our discussion, we will use the notation e[x := t] to denote the textual substitution of t for all free occurrences of x within e. For example, (succ x)[x:=1] is (succ 1). Write an SML function subst that takes a term e, a string x representing an identifier, and a term t, and returns e with all free occurrences of x (actually AST_ID x) replaced by t. For example,

  - subst (AST_APP (AST_SUCC,AST_ID "x")) "x" (AST_NUM 1);
  val it = AST_APP (AST_SUCC,AST_NUM 1) : term
Do not substitute for bound occurrences of identifiers. For instance, substituting 3 for x in
  ((fn x => succ x) (pred x))
should result in ((fn x => succ x) (pred 3)); the formal parameter x and its occurrences in the function body should not be affected by the substitution.

Hint: Just as in Part a, use pattern-matching on each constructor of the abstract syntax tree, calling subst recursively when you need to.

Part c. Using your substitution function, extend your interp function from Part a to include AST_FUN terms. The evaluation of terms involving AST_FUN should be done according to the rules given below.

Just like the built-in functions (succ, pred, and iszero), functions defined using fn evaluate to themselves:

 (10)    (fn x => e) => (fn x => e)	  
Computations occur when you apply these functions to arguments. The following rule defines call-by-value (or eager) function application, also used by SML: if the function is of the form fn x => e, evaluate the operand to a value v1, substitute v1 in for the formal parameter x in e, and then evaluate the modified body:
            e1 => (fn x => e)      e2 => v1    e[x:=v1] => v
 (11)    --------------------------------------------------------
                              e1 e2 => v
For instance, in evaluating the application
  ((fn x => succ x) (succ 0))
we first note that the function is already fully evaluated, so we evaluate (succ 0) to 1, and then plug this in for x in the body, succ x, of the function, obtaining succ 1, which evaluates to 2.

Notice that while terms of the form AST_ID s can appear whenever s is a formal parameter, we never need to evaluate such terms, because they are always replaced by the subst function before we evaluate the function body.

Any AST_ID s term that does remain in the function body at the time of evaluation represents an unbound identifier; your interpreter should return an AST_ERROR term in this case.

Part d. Surprisingly enough, evaluating recursive terms turns out to be quite easy. First let's talk about what a term of the form rec x => e actually means. It corresponds to the definition of a recursive function called x. Let's work with an example. The term

  rec sum => fn x => fn y => if iszero x then y else sum (pred x) (succ y)
corresponds to the following recursive SML function declaration
  fun sum x y = if x = 0 then y else sum (x - 1) (y + 1)
Thus, in rec x => e, we see that x is the name of the recursive function and e (which should be a fn term) gives the parameters and body of the function.

The rule for evaluating a recursive term is amazingly simple. Just evaluate the body of the term, where all occurrences of the recursively defined identifier are replaced by the entire rec term.

                  e[x:=(rec x => e)] => v
 (12)            --------------------------
                    (rec x => e) => v
It turns out that this is all that is needed to make recursion work!


  1. While this program is broken up into four parts in order to help you make progress through it, you should just turn in a single interp function that interprets the entire PCF language. (But don't forget to turn in subst as well.)

  2. The parser actually parses a slightly richer language than I've presented here. In particular, it extends the BNF for PCF shown above above with the rule
       e ::= let x = e in e end
    allowing let expressions as in SML. For example,
       let z = 2 in succ z end
    is allowed by the parser.

    Rather than creating an AST_LET term for such expressions (which would require more code in the interpreter), the parser treats let expressions as syntactic sugar. In particular, the parser treats

      let x = e1 in e2 end
    as if it were
      (fn x => e2) e1
    Thus the example above will be parsed as
    A bit of thought should convince you that this function application has exactly the same meaning as the let expression. Thus you may use let clauses in creating examples to test your interpreter, without having to write any new interpreter code.

    Also, the parser allows PCF programs to contain comments, which begin with the character # and extend to the end of the current line.

  3. Here is a skeleton file to help you to get started: interp1.sml. And here are some sample PCF programs for you to try out: twice.pcf, minus.pcf, factorial.pcf, fibonacci.pcf, and lists.pcf. These files are also available from my directory ~smithg.

Back to