Previous

Contents

Next

Chapter 13:
Building a calculator

No reckoning made, but sent to my account
With all my imperfections on my head.

— Shakespeare, Hamlet


13.1 Handling operator precedence
13.2 A stack package
13.3 An improved calculator
13.4 Implementing the stack package
13.5 Opaque types
13.6 Formalising the syntax of expressions
13.7 A recursive descent parser
Exercises

13.1 Handling operator precedence

It’s time to return to the calculator program that was developed in chapter 3, which was capable of evaluating arithmetic expressions like 2+3*4. Expressions were dealt with on a strict left-to-right basis, so that 2+3*4 would evaluate to 20. However, the normal rules of arithmetic tell us that the value of this expression should be 14, since the conventional interpretation involves performing multiplication before addition.

Modifying the calculator to deal with this will require delaying the addition operation until the multiplication has been performed. This is a well-known problem; algorithms to deal with this were first developed in the 1950s and refined in the 1960s. One approach is to use a data structure known as a stack. A stack is a collection with specific restrictions on how it can be accessed; the traditional comparison is with a pile of plates. New items can be added to the top of the stack (i.e. you can put more plates on top of the pile) and items can be removed from the top of the stack (i.e. you can remove plates from the top of the pile). The conventional names for these operations are pushing an item onto the stack and popping an item off the top. You can generally tell if someone is a programmer by asking what the opposite of ‘push’ is; programmers say ‘pop’, everyone else says ‘pull’! There may be a few extra operations; for example, you may be able to find out how many items the stack contains or inspect the top item without removing it. What you can’t do is add or remove items anywhere except at the top of the stack; if you were to try with a stack of plates the result might be a ‘stack crash’! A stack is said to have a last-in first-out (LIFO) organisation: the last item pushed onto the stack is the first one to be popped off.

Stacks are one of the most generally useful data structures around. They crop up in all sorts of situations; for example, the compiler relies on using a stack to keep track of procedure calls. When you call a procedure, your return address (the point in the calling procedure that you want to return to) is pushed onto a stack; returning from a procedure is simply a matter of popping the return address off the top of the stack and going back to the place it specifies. Stacks can also be used for evaluating arithmetic expressions according to the conventional rules of arithmetic.

The method for doing this requires two stacks, one for operands and one for operators. Whenever you see an operand, you put it on the operand stack; when you see an operator, you compare it with the operator on top of the operator stack. Each operation is given a priority (or precedence); multiplication has a higher priority than addition. If the operator you’ve just read has a higher priority than the one on top of the stack, you just push it onto the stack. This defers dealing with high-priority operators until you’ve had a chance to see what comes next. Otherwise, you remove the operator from the top of the stack, remove the top two values from the operand stack, apply the operator to the two operands and push the result onto the operand stack. You then repeat the process until the operator you’re considering does have a higher priority than the one on top of the operator stack. In other words, when you see a low-priority operator you first of all deal with any deferred operators on the stack which have the same priority or higher. Finally, you push the operator you’re considering onto the operator stack until you see what comes next.

To make this work, you need to prime the operator stack with an operator which has a lower priority than any other. At the end of the expression, operators must be removed one by one from the operator stack together with the top two operands from the operand stack; each operator is applied to its two operands and the result is pushed onto the operand stack. When you reach the low-priority operator on the bottom of the stack, the operand stack will contain a single value which is the result of the expression. Here’s what happens if you evaluate 2+3*4+5 using this algorithm:


           Input      Symbol     Operands   Operators  Action
        1) 2+3*4+5.                           #        (start state)
        2) +3*4+5.      2          2          #        Push 2
        3) 3*4+5.       +          2          # +      + > #; push +
        4) *4+5.        3          2 3        # +      Push 3
        5) 4+5.         *          2 3        # + *    * > +; push *
        6) +5.          4          2 3 4      # + *    Push 4
        7) 5.           +          2 12       # +      + < *; apply *
        8) 5.           +          14         #        + = +; apply +
        9) 5.           +          14         # +      + > #; push +
       10) .            5          14 5       # +      Push 5
       11) .                       19         #        . < +; apply +
    

The result is 19. I’ve used ‘#’ to represent the low-priority operator used to prime the operator stack. Operands and operators are pushed onto their respective stacks until step 7 is reached. Here we’ve got ‘+’ and the operator stack has ‘*’ on top. So the multiplication operator is removed from the operator stack, the top two items are removed from the operand stack (3 and 4), the multiplication operator is applied to the two operands, and the result (12) is pushed back onto the operand stack. Now we have a ‘+’ on top of the operator stack, so step 8 repeats the process; 2+12 gives 14 which is pushed onto the operand stack. Now the top of the operator stack is ‘#’, which has a lower priority than ‘+’, so the ‘+’ finally gets pushed onto the operator stack in step 9. At step 11 we’ve reached the full stop which signifies the end of the expression, so the ‘+’ on top of the operator stack is removed, the two operands 14 and 5 are removed from the operand stack, and the result (19) is pushed onto the operand stack. The top operator on the stack is now ‘#’, so the value on top of the operand stack (19) is the final result.


13.2 A stack package

At this point it’s worth considering the design of a package to implement a stack type. The stack type itself should be private to prevent users fiddling about with implementation details; it should also be limited to prevent attempts to copy one entire stack to another or comparing two stacks for equality. The package needs to be generic so that a stack of any type of item can be created; the generic parameter should be private for maximum flexibility, so that any type which supports assignment can be used. The package only needs to use assignment to store and retrieve the items, so this is no problem.

The two basic operations to be provided are Push and Pop. These will need the stack to be operated on as a parameter; it will need to be an in out parameter since it needs to be copied into the procedure and then modified. For convenience and completeness we can provide a few extra operations: a function Top which returns a copy of the top item on the stack, a function Size which returns the number of items on the stack, and a function Empty which returns a Boolean result to indicate whether or not the stack is empty.

There are two things that can go wrong: the stack can overflow as the result of attempting to push too many items onto the stack, or the stack can ‘underflow’ as the result of attempting to access the top item of an empty stack. This means that we need two exceptions, Stack_Overflow and Stack_Underflow. Here’s the resulting package specification, with the private part omitted for the moment:

        generic
            type Item_Type is private;
        package JE.Stacks is
            type Stack_Type is limited private;

            procedure Push (Stack : in out Stack_Type;
                            Item  : in Item_Type);
            procedure Pop  (Stack : in out Stack_Type;
                            Item  : out Item_Type);
            function Top   (Stack : Stack_Type) return Item_Type;
            function Size  (Stack : Stack_Type) return Natural;
            function Empty (Stack : Stack_Type) return Boolean;

            Stack_Overflow, Stack_Underflow : exception;

        private
            -- to be dealt with later
        end JE.Stacks;
    

Here’s a modified version of chapter 3’s calculator program which uses this stack package. I’ve tried to keep the structural changes to a minimum so that you can compare this with the original version more easily:

        with Ada.Text_IO, Ada.Integer_Text_IO, JE.Stacks;
        use Ada.Text_IO, Ada.Integer_Text_IO;
        procedure Calculator is
            package Operand_Stacks  is new JE.Stacks (Integer);
            package Operator_Stacks is new JE.Stacks (Character);
            use Operand_Stacks, Operator_Stacks;

            Operand_Stack  : Operand_Stacks.Stack_Type;
            Operator_Stack : Operator_Stacks.Stack_Type;

            Operator : Character;
            Operand  : Integer;

            procedure Apply is
                Left, Right : Integer;
                Operator    : Character;
            begin
                Pop (Operand_Stack, Right);
                Pop (Operand_Stack, Left);
                Pop (Operator_Stack, Operator);
                case Operator is
                    when '+' => Push (Operand_Stack, Left + Right);
                    when '-' => Push (Operand_Stack, Left - Right);
                    when '*' => Push (Operand_Stack, Left * Right);
                    when '/' => Push (Operand_Stack, Left / Right);
                    when others => raise Program_Error;  -- should never happen (!)
                end case;
            end Apply;

            function Prio (Operator : Character) return Natural is
            begin
                case Operator is
                    when '+' | '-' => return 1;
                    when '*' | '/' => return 2;
                    when '#' => return 0;
                    when others => raise Program_Error;  -- should never happen (!)
                end case;
            end Prio;
        begin            -- main program
            Push (Operator_Stack, '#');
            Put ("Enter an expression: ");
            Get (Operand);
            Push (Operand_Stack, Operand);

            loop
                loop
                    Get (Operator);
                    exit when Operator /= ' ';
                end loop;
                if Operator = '.' then
                    while Top(Operator_Stack) /= '#' loop
                        Apply;
                    end loop;
                    Put (Top(Operand_Stack), Width => 1);
                    exit;
                end if;
                case Operator is
                    when '+' | '-' | '*' | '/' =>
                        while Prio(Operator) <= Prio(Top(Operator_Stack)) loop
                            Apply;
                        end loop;
                        Push (Operator_Stack, Operator);
                    when others =>
                        Put ("Invalid operator '");
                        Put (Operator);
                        Put ("'");
                        exit;
                end case;
                Get (Operand);
                Push (Operand_Stack, Operand);
            end loop;

            New_Line;
        end Calculator;
    

This program uses an internal procedure called Apply which pops two operands and an operator and then pushes the result of applying the operator to the two operands; there is also an internal function called Prio which returns the priority of a selected operator. These procedures should never be called with anything except a valid operator, but I’ve written them defensively; Murphy’s Law states that ‘what can go wrong, will’ so rather than taking anything for granted I’ve made sure that Program_Error will be raised by Apply and Prio if they come across an illegal operator character.


13.3 An improved calculator

The program as it stands expects its input to be a sequence of integers separated by operators and ending with a full stop. However, we might want to be able to deal with expressions like ‘2*(3+4)’ which do not follow this pattern. In order to deal with parentheses, we need to read the input a character at a time rather than blindly assuming that the first thing on the line is an integer or that an integer follows every operator. Fortunately, Text_IO provides a procedure called Look_Ahead to enable us to look at the next input character without actually reading it. Look_Ahead takes two output parameters: a Character variable which will be set to the value of the next character and a Boolean variable which will be set True if we’re at the end of the line (in which case the character parameter won’t be affected). Here’s how we could use Look_Ahead to get either an operator or an integer, assuming the existence of a Character variable Ch, a Boolean variable EOL and an Integer variable I:

        loop
            Look_Ahead (Ch, EOL);
            if EOL then
                Skip_Line;        -- end of line, so go to next line
            elsif Ch = ' ' then
                Get (Ch);         -- space, so read it and ignore it
            else
                exit;             -- non-space, so exit loop
            end if;
        end loop;

        -- At this point, Ch is a copy of the next character, but the
        -- next character hasn't actually been read yet. We can read
        -- an Integer or a Character, as appropriate...

        if Ch in '0'..'9' then
            Get (I);              -- digit, so get an integer in I
        else
            Get (Ch);             -- non-digit, so read it into Ch
        end if;
    

Parentheses need to be dealt with in a special way. An opening parenthesis ‘(’ is given a priority of 0, just like the ‘#’ which is used to prime the stack. Any operators which follow this will have a higher priority and so will be stacked up above it. When we get to the closing parenthesis we clear operators by applying them to their operands until the opening parenthesis is the top item on the stack. It is then discarded, as is the closing parenthesis. We’ll also need to check that there are no opening parentheses that haven’t been left unmatched at the end of the expression. This is what happens when ‘2*(3+4)’ is evaluated:


           Input      Symbol     Operands   Operators  Action
        1) 2*(3+4).                           #        (start state)
        2) *(3+4).      2          2          #        Push 2
        3) (3+4).       *          2          # *      * > #; push *
        4) 3+4).        (          2          # * (    Push (
        5) +4).         3          2 3        # * (    Push 3
        6) 4).          +          2 3        # * ( +  + > (; push +
        7) ).           4          2 3 4      # * ( +  Push 4
        8) .            )          2 7        # * (    Apply +
        9) .            )          2 7        # *      Remove (
       10) .                       14         #        . < *; apply *
    

Here’s a modified version of the program which allows for parentheses. Since Look_Ahead tells us when we’ve reached the end of the input line, we can use this as the expression terminator instead of requiring a full stop. Also, an exception Syntax_Error is used to respond to syntax errors in the input:

        with Ada.Text_IO, Ada.Integer_Text_IO, JE.Stacks;
        use Ada.Text_IO, Ada.Integer_Text_IO;
        procedure Calculator is
            package Operand_Stacks i s new JE.Stacks (Integer);
            package Operator_Stacks is new JE.Stacks (Character);
            use Operand_Stacks, Operator_Stacks;

            Operand_Stack  : Operand_Stacks.Stack_Type;
            Operator_Stack : Operator_Stacks.Stack_Type;

            Operator    : Character;
            Operand     : Integer;
            Line_End    : Boolean;
            Got_Operand : Boolean := False;

            procedure Apply is
                Left, Right : Integer;
                Operator    : Character;
            begin
                Pop (Operand_Stack, Right);
                Pop (Operand_Stack, Left);
                Pop (Operator_Stack, Operator);

                case Operator is
                    when '+' => Push (Operand_Stack, Left + Right);
                    when '-' => Push (Operand_Stack, Left - Right);
                    when '*' => Push (Operand_Stack, Left * Right);
                    when '/' => Push (Operand_Stack, Left / Right);
                    when others => raise Program_Error;
                end case;
            end Apply;

            function Prio (Operator : Character) return Natural is
            begin
                case Operator is
                    when '+' | '-' => return 1;
                    when '*' | '/' => return 2;
                    when '#' | '(' => return 0;
                    when others => raise Program_Error;
                end case;
            end Prio;

            Syntax_Error : exception;

        begin                                       -- main program
            Push (Operator_Stack, '#');
            Put ("Enter an expression: ");
            loop
                -- Get next non-space character
                loop
                    Look_Ahead (Operator, Line_End);
                    exit when Line_End or Operator /= ' ';
                    Get (Operator);                 -- got a space, so skip it
                end loop;

                -- Exit main loop at end of line
                exit when Line_End;

                -- Process operator or operand
                if Operator in '0'..'9' then        -- it's an operand
                    if Got_Operand then             -- can't have an operand
                        Put ("Missing operator");   -- immediately after another
                        exit;
                    end if;
                    Get (Operand);                  -- read the operand
                    Push (Operand_Stack, Operand);
                    Got_Operand := True;            -- record we've got an operand
                else                                -- it's not an operand
                    Got_Operand := False;           -- so record the fact
                    exit when Operator = '.';       -- exit at end of expr.
                    Get (Operator);                 -- else read the operator
                    case Operator is                -- and apply it
                        when '+' | '-' | '*' | '/' =>
                            while Prio(Operator) <= Prio(Top(Operator_Stack))
                                Apply;
                            end loop;
                            Push (Operator_Stack, Operator);

                        when '(' =>              -- stack left parenthesis
                            Push (Operator_Stack, Operator);

                        when ')' =>              -- unwind stack back to '('
                            while Prio(Top(Operator_Stack)) > Prio('(') loop
                                Apply;
                            end loop;
                            Pop (Operator_Stack, Operator);
                            if Operator /= '(' then
                                Put ("Missing left parenthesis");
                                raise Syntax_Error;
                            end if;

                        when others =>
                            Put ("Invalid operator '");
                            Put (Operator);
                            Put ("'");
                            raise Syntax_Error;
                    end case;
                end if;
            end loop;

            -- Apply remaining operators from stack
            while Prio(Top(Operator_Stack)) > Prio('#') loop
                Apply;                              -- unwind stack, apply operations
            end loop;

            -- Display result or report error
            if Top(Operator_Stack) = '#' then
                Put (Top(Operand_Stack), Width => 1);
                New_Line;
            else
                Put ("Missing right parenthesis");
                raise Syntax_Error;
            end if;
        exception
            when Syntax_Error =>
                Put_Line (" -- program terminated.");
        end Calculator;
    

The main loop is executed until Look_Ahead reports the end of the line has been reached. If the character in Operator is a digit, an integer is read into Operand and pushed onto the stack and a Boolean variable called Got_Operand will be set to indicate that an operand has just been read. This is used to guard against two operands being entered one after the other; if this happens an error message is displayed and a Syntax_Error is raised. The processing of the operators is as it was earlier except that two extra cases are provided to deal with the left and right parentheses. A left parenthesis is just pushed onto the stack; a right parenthesis applies operators from the operator stack until an operator with a priority of zero is reached. If this isn’t a left parenthesis it means that there are too many right parentheses in the expression, so an error message is displayed and a Syntax_Error is raised.

The main loop ends when Look_Ahead reports that the end of the line has been reached. Operators are then applied to their operands until an operator with a priority of zero is reached; if this isn’t the ‘#’ used to prime the stack it must be an unmatched left parenthesis, which means that there is a right parenthesis missing somewhere.


13.4 Implementing the stack package

We’ve seen how to use the stack package to improve on the calculator design from chapter 3; now it’s time to return to the stack package itself and look at ways of implementing its body. Since a stack is a collection, we could use an array or a linked list to implement it. Let’s consider an array implementation first. We’ll need an array of items for the stack itself together with a count to tell us how many items there are on the stack. The private section of the package specification might look like this:

        package JE.Stacks is
            -- as before

        private
            Max_Items : constant := 100;    -- an arbitrary maximum size for stacks

            type Stack_Array is array (1..Max_Items) of Item_Type;
            subtype Stack_Pointer is Natural range 0..Max_Items;
            type Stack_Type is
                record
                    Value : Stack_Array;
                    Top   : Stack_Pointer := 0;
                end record;
        end JE.Stacks;
    

The package has a built-in arbitrary limit of 100 items per stack. The stack pointer (Top) is set to zero by default so that all stacks will automatically start off empty. Pushing an item involves incrementing Top and storing the item at the array position it indicates; popping an item off the stack is the reverse process, i.e. the item at position Top is retrieved and then Top is decremented. The other functions are equally straightforward. Here’s the package body:

        package body JE.Stacks is

            procedure Push (Stack : in out Stack_Type;
                            Item  : in Item_Type) is
            begin
                Stack.Top              := Stack.Top + 1;
                Stack.Value(Stack.Top) := Item;
            exception
                when Constraint_Error =>
                    raise Stack_Overflow;
            end Push;

            procedure Pop (Stack : in out Stack_Type;
                           Item  : out Item_Type) is
            begin
                Item      := Top(Stack);
                Stack.Top := Stack.Top - 1;
            end Pop;

            function Top (Stack : Stack_Type) return Item_Type is
            begin
                return Stack.Value(Stack.Top);
            exception
                when Constraint_Error =>
                    raise Stack_Underflow;
            end Top;

            function Size (Stack : Stack_Type) return Natural is
            begin
                return Stack.Top;
            end Size;

            function Empty (Stack : Stack_Type) return Boolean is
            begin
                return Size(Stack) = 0;
            end Empty;

        end JE.Stacks;
    

Note how Pop uses Top to get the value of the topmost stack item, including the necessary checks for stack underflow, so as to avoid code duplication.

A linked list approach is just as simple. Here’s what the private part of the specification looks like if we decide to take this approach:

        with JE.Lists;
        package JE.Stacks is
            -- as before
        private
            package Item_Lists is new JE.Lists (Item_Type);
            type Stack_Type is new Item_Lists.List_Type;
        end JE.Stacks;
    

Notice that Stack_Type is effectively just a renaming of Item_Lists.List_Type. However, since Stack_Type was declared as a private type, you can’t declare it as a subtype in the private part of the specification in order to effect a renaming; it must be declared as a type, so I’ve declared it as a derived type. The only problem with this is that Stack_Type values will need to be explicitly converted to Item_Lists.List_Type values before operations on List_Type values can be applied:

        function Size (Stack : Stack_Type) return Natural is
        begin
            return Item_Lists.Size (Item_Lists.List_Type(Stack));
        end Size;
    

This is awkward; another approach which leads to more readable code is to declare Stack_Type as a record containing a List_Type component:

        with JE.Lists;
        package JE.Stacks is
            -- as before
        private
            package Item_Lists is new JE.Lists (Item_Type);
            type Stack_Type is
                record
                    L : Item_Lists.List_Type;
                end record;
        end JE.Stacks;
    

The Size function above can now be written like this:

        function Size (Stack : Stack_Type) return Natural is
        begin
            return Item_Lists.Size (Stack.L);
        end Size;
    

This is much less complicated; using a record type like this is a trick worth remembering for future reference. Now we can use the operations on lists defined in Lists to implement the package body:

        with JE.Lists;
        package body JE.Stacks is

            package ILP renames Item_Lists;     -- for convenience

            procedure Push (Stack : in out Stack_Type;
                            Item  : in Item_Type) is
            begin
                ILP.Insert (ILP.First(Stack.L), Item);
            exception
                when Storage_Error =>
                    raise Stack_Overflow;
            end Push;

            procedure Pop (Stack : in out Stack_Type;
                           Item  : out Item_Type) is
            begin
                Item := Top(Stack);
                ILP.Delete (ILP.First(Stack.L));
            end Pop;

            function Top (Stack : Stack_Type) return Item_Type is
            begin
                return ILP.Value(ILP.First(Stack.L));
            exception
                when ILP.List_Error =>
                    raise Stack_Underflow;
            end Top;

            function Size (Stack : Stack_Type) return Natural is
            begin
                return ILP.Size(Stack.L);
            end Size;

            function Empty (Stack : Stack_Type) return Boolean is
            begin
                return Size(Stack) = 0;
            end Empty;

        end JE.Stacks;
    

13.5 Opaque types

The problem with the approaches above is that if we decide to change the implementation from an array to a linked list or vice versa, the package specification needs changing. This will mean recompiling any existing code that uses the stack package, which may be very inconvenient if it’s used in library units totalling hundreds of thousands of lines. A better approach is to try to hide all the implementation details inside the package body, so that any changes affect only the body and not the specification. That way we have to recompile the body, but since the specification won’t have changed the client units won’t need to be recompiled. The only thing that will need doing to client code is relinking it with the new body which will be much quicker than recompiling it.

How can we achieve this state of ultimate privacy? The answer is to use one of the features of access types that was mentioned when I first started talking about them; if you tell the compiler the name of a type in an incomplete declaration you can define an access type for it without the compiler needing to know anything more about the type. All access values are the same size, no matter what type of thing they point to, so the compiler can allocate memory for access variables without needing to know any details about what they point to. What we end up with is an opaque type whose name is mentioned but whose definition can be hidden inside the package body:

        package JE.Stacks is
            -- as before
        private
            type Stack_Item;            -- defined in package body
            type Stack_Type is access Stack_Item;
        end JE.Stacks;
    

The package body gets a little bit more complicated since a Stack_Type variable will start off set to null, so we’ll need to create a Stack_Item variable with new the first time an item is pushed onto it. We’ll also need to check for null in all the package’s procedures and functions. Here’s how it’s done using an array:

        package body JE.Stacks is

            Max_Items : constant := 100;    -- arbitrary maximum stack size

            type Stack_Array is array (1..Max_Items) of Item_Type;
            subtype Stack_Pointer is Natural range 0..Max_Items;

            -- Completion of opaque type declaration from package specification
            type Stack_Item is
                record
                    Value : Stack_Array;
                    Top   : Stack_Pointer := 0;
                end record;

            procedure Push (Stack : in out Stack_Type;
                            Item  : in Item_Type) is
            begin
                if Stack = null then
                    Stack := new Stack_Item;
                end if;
                Stack.Top              := Stack.Top + 1;
                Stack.Value(Stack.Top) := Item;
            exception
                when Storage_Error | Constraint_Error =>
                    raise Stack_Overflow;
            end Push;

            procedure Pop (Stack : in out Stack_Type;
                           Item  : out Item_Type) is
            begin
                Item      := Top(Stack);
                Stack.Top := Stack.Top - 1;
            end Pop;

            function Top (Stack : Stack_Type) return Item_Type is
            begin
                return Stack.Value(Stack.Top);
            exception
                when Constraint_Error =>
                    raise Stack_Underflow;
            end Top;

            function Size (Stack : Stack_Type) return Natural is
            begin
                if Stack = null then
                    return 0;
                else
                    return Stack.Top;
                end if;
            end Size;

            function Empty (Stack : Stack_Type) return Boolean is
            begin
                return Size(Stack) = 0;
            end Empty;

        end JE.Stacks;
    

Push will allocate a new stack if it’s null; if we run out of memory to allocate the stack, a Storage_Error exception will be raised. Like Constraint_Error, this is simply reported back to the caller as a stack overflow. Pop and Top don’t need changing; any attempt to access the top of a stack which hasn’t been allocated (i.e. which is still null) will raise Constraint_Error, which will then be turned into a Stack_Underflow exception by Top. Size and Empty are modified so that if you use either of them with a stack that hasn’t been allocated they’ll behave as if the stack exists but is empty.

Changing the package to use a linked list is fairly straightforward:

        with JE.Lists;
        package body JE.Stacks is

            package Lists is new JE.Lists (Item_Type);

            type Stack_Item is
                record
                    L : Lists.List_Type;
                end record;

            procedure Push (Stack : in out Stack_Type;
                            Item  : in Item_Type) is
            begin
                if Stack = null then
                    Stack := new Stack_Item;
                end if;
                Lists.Insert (Lists.First(Stack.L), Item);
            exception
                when Storage_Error =>
                    raise Stack_Overflow;
            end Push;

            procedure Pop (Stack : in out Stack_Type;
                           Item  : out Item_Type) is
            begin
                Item := Top(Stack);
                Lists.Delete (Lists.First(Stack.L));
            end Pop;

            function Top (Stack : Stack_Type) return Item_Type is
            begin
                return Lists.Value(Lists.First(Stack.L));
            exception
                when Lists.List_Error =>
                    raise Stack_Underflow;
            end Top;

            function Size (Stack : Stack_Type) return Natural is
            begin
                if Stack = null then
                    return 0;
                else
                    return Lists.Size (Stack.L);
                end if;
            end Size;

            function Empty (Stack : Stack_Type) return Boolean is
            begin
                return Size(Stack) = 0;
            end Empty;

        end JE.Stacks;
    

This reveals another advantage of burying the list in a record rather than using a derived type; if a derived type were used, you would need to write things like this in order to use Lists operations on a Stack_Type value:

        return Lists.Value(Lists.First(Lists.List_Type(Stack.all)));
                -- instead of "return Lists.Value(Lists.First(Stack.L));"
    

Since Stack_Type is an access type, the value that a Stack_Type object called Stack points to would have to be be accessed using Stack.all; this would then need to be converted to List_Type before any List_Type operations could be performed on it.


13.6 Formalising the syntax of expressions

Trying to detect and deal with all the possible errors that can occur in an arithmetic expression is really quite difficult. An alternative way of dealing with expressions known as recursive descent parsing can make life quite a bit easier. This is based on having a formal definition of the syntax of an expression. We can define an expression as a sequence of terms separated by adding operators. A term can then be defined as a sequence of primaries separated by multiplying operators, and a primary as either a number or an expression enclosed in parentheses. This is usually written using a formal notation similar to this:

        Expression  = Term { AddOp Term }
        Term        = Primary { MulOp Primary }
        Primary     = Number | ( Expression )
        AddOp       = + | -
        MulOp       = * | /
    

Here the curly brackets ‘{ ... }’ indicate that what they enclose can be repeated any number of times (including zero times) and the vertical bar ‘|’ means ‘or’. The first rule is equivalent to an infinitely long rule that looks like this:

        Expression  = Term
                    | Term AddOp Term
                    | Term AddOp Term AddOp Term
                    | Term AddOp Term AddOp Term AddOp Term
                    ... and so on

Primary is defined as being a number or an expression enclosed in parentheses. This seems slightly odd when you think about it; Expression is defined in terms of Term which is defined in terms of Primary which is defined in terms of Expression which is defined in terms of ... It seems completely circular, but actually it makes perfect sense. Look at how 2*(3+4) is analysed using these rules:

An apparently circular definition like this is known as a recursive definition. It may seem like a bit of a curiosity, but recursion turns up in all sorts of problems. Want to sort an array? Split it in two and sort each half, and then merge the sorted halves. This may sound silly, but it’s the basis of a well-known technique called merge sorting which outperforms all of the ‘obvious’ ways of sorting that generally spring to people’s minds (e.g. the shuffle sort described in chapter 6). Want to generate all possible permutations of a sequence? Take each item in turn as the start of a permutation and append all possible permutations of the items that are left, so that the possible permutations of [1,2,3] are generated by 1 followed by all permutations of [2,3], 2 followed by all permutations of [1,3], and 3 followed by all permutations of [1,2]; this gives you [1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2] and [3,2,1].

All recursion has to stop somewhere; the merge sort stops recursing when the array to be sorted contains a single item and the permutation algorithm stops when you have to generate all possible permutations of a single item. In the case of parsing an expression recursion only occurs when you see a left parenthesis, so it’s impossible to get stuck in an infinite loop. In programming terms, recursion is where a function or procedure calls itself either directly or indirectly. Here’s an example of a recursive function to calculate factorials: factorial N is the product of 1 * 2 * 3 * ... * N (and as a special case, factorial 0 is 1), which means that factorial N is N * factorial (N – 1) for all N > 0, and 1 when N = 0:

    function Factorial (N : Natural) return Positive is
    begin
        if N = 0 then
            return 1;
        else
            return N * Factorial(N-1);
        end if;
    end Factorial;

The recursion always stops because N gets smaller every time and eventually gets down to 0 which is where the recursion ends.


13.7 A recursive descent parser

Writing a recursive descent parser involves converting the syntax rules given earlier into Ada procedures. The first thing we need is a function Next_Character which uses Look_Ahead to find out what the next non-space character is without reading it:

    function Next_Character return Character is
        Ch  : Character;
        EOL : Boolean;
    begin
        loop
            Look_Ahead (Ch, EOL);
            if EOL then
                return '.';
            elsif Ch /= ' '
                return Ch;
            else
                Get (Ch);
            end if;
        end loop;
    end Next_Character;

This produces a full stop as its result when it reaches the end of the current line. The next step is to implement a function called Expression:

    function Expression return Integer is
        Value    : Integer;
        Operator : Character;
    begin
        Value := Term;
        loop
            Operator := Next_Character;
            exit when Operator /= '+' and Operator /= '-';
            Get (Operator);
            if Operator = '+' then
                Value := Value + Term;
            else
                Value := Value - Term;
            end if;
        end loop;
        return Value;
    end Expression;

This gets a Term (using another function we’ll have to implement in a moment) and then sits in a loop accumulating more Terms as long as it sees an adding operator. Term is very similar to Expression:

    function Term return Integer is
        Value    : Integer;
        Operator : Character;
    begin
        Value := Primary;
        loop
            Operator := Next_Character;
            exit when Operator /= '*' and Operator /= '/';
            Get (Operator);
            if Operator = '*' then
                Value := Value * Primary;
            else
                Value := Value / Primary;
            end if;
        end loop;
        return Value;
    end Term;

This is defined in terms of Primary, which looks like this:

    function Primary return Integer is
        Value : Integer;
        Ch    : Character;
    begin
        Ch := Next_Character;
        case Ch is
            when '0'..'9' =>
                Get (Value);
            when '(' =>
                Get (Ch);
                Value := Expression;
                if Next_Character /= ')' then
                    raise Syntax_Error;
                else
                    Get (Ch);
                end if;
            when others =>
                raise Syntax_Error;
        end case;
        return Value;
    end Primary;

A primary must be either a number or a parenthesised expression, so the case statement in this function checks for a digit or a left parenthesis as the valid characters which can start an operand. If the character is anything other than a digit or left parenthesis, a Syntax_Error exception gets raised. This deals with errors arising from missing operands, as in ‘* 2’ or ‘1 + * 2’ where Primary will see an operator rather than a digit or left parenthesis.

If the character is a digit, an integer is read from the input and returned as the value of the function; if it’s a left parenthesis, an expression is read and its value used as the function’s return value. It is, however, necessary to check that a parenthesised expression ends with a right parenthesis, so Next_Character is used after reading the expression to check for this. If the next character is a right parenthesis, it is simply read in and ignored; if not, a Syntax_Error exception is raised. This deals with errors arising from missing right parentheses.

A main program using these functions simply needs to call Expression and then check that the next character is a full stop. If it isn’t, there’s a syntax error arising from a missing left parenthesis, a missing operator or an illegal character. If the character is a right parenthesis, this indicates a missing left parenthesis, e.g. ‘1 + 2)’; if it’s a digit there’s a missing operator, e.g. ‘1 2’, and if it’s anything else it’s an illegal character, e.g. ‘1 & 2’. The functions above can be embedded in a program together with the exception Syntax_Error; here’s what the main program might look like:

    with Ada.Text_IO, Ada.Integer_Text_IO, JE.Parser;
    use Ada.Text_IO, Ada.Integer_Text_IO;
    procedure Evaluate is
        Syntax_Error : exception;

        function Next_Character return Character is separate;
        function Expression     return Integer   is separate;
        function Term           return Integer   is separate;
        function Primary        return Integer   is separate;

        Value : Integer;

    begin
        Put ("Enter an expression: ");
        Value := Expression;
        if Next_Character /= '.' then
            raise Syntax_Error;
        end if;
        Put (Value);
        New_Line;
    exception
        when Syntax_Error =>
            Put_Line ("Syntax error!");
    end Evaluate;

The functions themselves will need modifying to include separate clauses at the beginning, like this:

    separate (Evaluate)
    function Expression return Integer is
        ...
    end Expression;

Exercises

13.1 Using a doubly linked list for a stack is overkill since items are only ever added to or removed from the front of the list. Modify the stack package to use a simple singly linked list which is manipulated directly by the operations in the package body, and test it using the calculator program.

13.2 Write a program to generate all possible anagrams of a word using the recursive method described in this chapter. If you want to get a list in alphabetical order, sort the characters making up the word into ascending order before generating the anagrams. To avoid generating duplicates, don’t use the same letter twice. For example, to generate all unique anagrams of ‘apple’ in alphabetical order, sort the letters to give aelpp, then display a followed by all anagrams of elpp, e followed by all anagrams of alpp, l followed by all anagrams of aepp, p followed by all anagrams of aelp, but don’t do it again for the second p.

13.3 The double-ended queue (or deque) is yet another data structure that is useful in many situations. A deque is a sequence of items which allows you to add new items to either end or remove items from either end. Produce a deque package which uses an opaque type to represent a deque and write a test program to allow you to exercise the package. Model your package on the stack package shown in this chapter.

13.4 The data structure described in exercise 11.2 is known as a binary tree. It is a fundamentally recursive structure; the node at the root of the tree points to two other nodes which can be considered to be the roots of two smaller subtrees with the same structure. Modify your solution to exercise 11.2 using a recursive algorithm for searching the tree.



Previous

Contents

Next

This file is part of Ada 95: The Craft of Object-Oriented Programming by John English.
Copyright © John English 2000. All rights reserved.
Permission is given to redistribute this work for non-profit educational use only, provided that all the constituent files are distributed without change.
$Revision: 1.2 $
$Date: 2001/11/17 12:00:00 $