Advanced example 1: parsing expressions (CForm)

Advanced example 1: parsing expressions (CForm)

In this chapter we show how Yacas represents expressions and how we can build functions that work on all expressions. Our specific example will be "CForm()", a standard library function that converts Yacas expressions into C or C++ code. (Although the input format of Yacas expressions is already very close to C and perhaps could be converted to C by means of an external text filter, it is instructive to understand how to define Yacas functions that produce C code.) We shall only design the core mechanism of "CForm()" and build a limited version that handles only expressions using the four arithmetic actions.


Recursive parsing of expression trees

As we have seen in the tutorial, Yacas represents all expressions as trees, or equivalently, as lists of lists. For example, the expression "a+b+c+d+e" is for Yacas a tree of depth 4 that could be visualized as
     "+"
   a    "+"
       b   "+" 
          c   "+"
              d  e
Complicated expressions are thus built from simple ones in a general and flexible way. If we want a function that acts on sums of any number of terms, we only need to define this function on a single atom and on a sum of two terms, and the Yacas engine will recursively perform the action on the entire tree.

So our first try is to define rules for transforming an atom and for transforming sums and products. The result of CForm() will always be a string and we can use recursion like this:
In> 100 # CForm(a_IsAtom) <-- String(a);
Out> True;
In> 100 # CForm(_a + _b) <-- CForm(a) : " + " : CForm(b);
Out> True;
In> 100 # CForm(_a * _b) <-- CForm(a) : " * " : CForm(b);
Out> True;
We used the string concatenation operator ":" and we added spaces around the binary operators for clarity. All rules have the same precedence 100 because there are no conflicts in rule ordering so far: these rules apply in mutually exclusive cases. Let's try converting some simple expressions now:
In> CForm(a+b*c);
Out> "a + b * c";
In> CForm(a+b*c*d+e+1+f);
Out> "a + b * c * d + e + 1 + f";
After just three rules, we were able to process even some complicated expressions. How did it work? We could illustrate the steps Yacas went through when simplifying CForm(a+b*c) roughly like this:
CForm(a+b*c)
    ... apply 2nd rule
CForm(a) : " + " : Cform(b*c)
    ... apply 1st rule and 3rd rule
"a" : " + " : Cform(b) : " * " : Cform(c)
    ... apply 1st rule
"a" : " + " : "b" : " * " : "c"
    ... concatenate strings
"a + b * c"


Handling precedence of infix operations

It seems that recursion will do all the work for us. The power of recursion is indeed great and heavy use of recursion is built in the design of Yacas. We just need to add rules for more operators, for example, the unary addition, subtraction and division:
100 # CForm(+ _a) <-- "+ " : CForm(a);
100 # CForm(- _a) <-- "- " : CForm(a);
100 # CForm(_a - _b) <-- CForm(a) : " - " : CForm(b);
100 # CForm(_a / _b) <-- CForm(a) : " / " : CForm(b);
However, soon enough we find that we forgot about operator precedence. Our simple-minded "CForm()" gives wrong C code for expressions like this:
In> CForm( (a+b) * c );
Out> "a + b * c";
We need to get something like "(a+b)*c" in this case. How would we add a rule to insert parentheses around subexpressions? We could just insert them at every subexpression, replacing our rules by something like
100 # CForm(_a + _b) <-- "(" : CForm(a) : " + " : CForm(b) : ")";
100 # CForm(- _a) <-- "(- " : CForm(a) : ")";
and so on. This will always produce correct C code, e.g. in our case "((a+b)*c)", but generally the output will be full of unnecessary parentheses.

We could improve the situation by inserting parentheses only if the higher-order expression requires them; for this to work, we need to make a call such as "CForm(a+b)" aware that the enveloping expression has a multiplication by "c" in it. This can be implemented by passing an extra argument to "CForm()" that will indicate the precedence of the enveloping operation. A compound expression that uses an infix operator must be bracketed if the precedence of that infix operator is higher than the precedence of the enveloping infix operation.

We shall define an auxiliary function that we shall also call "CForm" but with a second argument, the precedence of the enveloping infix operation. If there is no enveloping operation, we shall set the precedence to a large number, e.g. 60000, so that no parentheses will be inserted around the whole expression. The new "CForm(expr, precedence)" will handle two cases: either parentheses are necessary, or unnecessary. For clarity we shall implement these cases in two separate rules. The initial call to "CForm(expr)" will be delegated to "CForm(expr, precedence)". The precedence values of infix operators such as +" and "*" is fixed in Yacas but may change from version to version, so we shall use the function OpPrecedence() to determine it. The new rules could look like this:
PlusPrec := OpPrecedence("+");
100 # CForm(_expr) <-- CForm(expr, 60000);
100 # CForm(_a + _b, _prec)_(PlusPrec>prec) <--
    "(" : CForm(a, PlusPrec) : " + " : CForm(b, PlusPrec) : ")";
120 # CForm(_a + _b, _prec) <--
    CForm(a, PlusPrec) : " + " : CForm(b, PlusPrec);
and so on. We omitted the predicate for the last rule because it has a later precedence than the preceding rule. The way we wrote these rules is unnecessarily repetitive but straightforward and it illustrates the central ideas of expression processing in Yacas. The standard library implements "CForm()" essentially in this way. In addition the library implementation supports standard mathematical functions, arrays and so on, and is better organized to allow easier extensions and avoid repetition of code.