CSE 340 - Principles of Programming Languages

Spring 2005

Aryeh Faltz

SECOND PROGRAMMING PROJECT

Due: 1:00 PM, Tuesday April 26, 2005

This project has two parts, a theoretical ("pencil-and-paper") part and a programming part. As with the first project, the "pencil-and-paper" part of this project is to be written up as a comment inside the file with your program - put it somewhere near the beginning of the file. The program is to be written in PROLOG. Here are the details.

Part I

We are interested in setting up an axiomatic ("algebraic") definition for representing the semantics of the type (or sort) "symbol table". IMPORTANT: for this project we are doing it somewhat differently than you did in the Chapter 9 homework!

For Part I, which is theory, we will use five operations with the following signatures:

newprogram: à symboltable

newblock: symboltable à symboltable

declare: symboltable ´ identifier ´ typename à symboltable

lookup: symboltable ´ identifier à typename ´ nestingdepth

endblock: symboltable à symboltable

The sorts (or types) "identifier" and "typename" can both be viewed as referring to the set of lower-case alphabetic strings. The sort "nestingdepth" can be viewed as the set of nonnegative integers.

Here are INTUITIVE ideas that describe each of these operations. The way these actually are supposed to work in the project will be described further on.

The operation "newprogram" is performed by the compiler when it starts to compile a new program, before it has actually read in anything. Intuitively, it creates an empty symbol table.

The operation "newblock" is performed by the compiler when it enters a new block. If we assume that the program it is reading is written in a C-type language, we might imagine that "newblock" is carried out either when the token "{" is read in, or when a function name has been seen, in which case the next "{" is ignored. However, you don't have to think about this for this project! All you need to know is that this operation creates a symbol table which is the very similar to its argument except that somehow the new symbol table "knows" that a new block has been entered.

The operation "declare" is performed when the compiler compiles a declaration. For this project, we'll assume that the information in every declaration consists only of an identifier for a variable and of a type name; to keep things simple, both of these are lower-case alphabetic strings for this project. Intuitively, this operation pushes an entry onto the symbol table which represents the information in the declaration. As with the "newblock" operation, you don't have to think about how the compiler knows that it is at a declaration.

To make things a bit simpler for the project, let's IGNORE the possibility that the compiler might come upon two declarations for the same identifier in the same block. In other words, do not try to write the axioms in such a way that this is an error situation. Simply ignore the issue.

The operation "lookup" is performed when the compiler comes upon an identifier in the code other than in a declaration. Intuitively, it looks up the entry for the identifier on the symbol table and returns the type and the nesting-depth of that identifier.

The operation "endblock" is performed when the compiler leaves a block. Assuming C-type syntax, we can imagine that "endblock" would be carried out when ever the token "}" is read in (and not at any other time); but you don't actually have to think about this for this project! What the "endblock" operation actually does, intuitively, is the following. Entries which had been entered during the block which is being exited are popped off; what remains on the symbol table somehow "knows" that a block has been left. Reread the "symbol table" handout if you need reminding.

Your task in Part I is to write up axioms for this definition of "symboltable". Some hints:

Hint 1: Use our heuristic. Note that there are four constructors, one of which is destructive. How many axioms do you need? What combinations of operations will they represent?

Hint 2: You need to insure that when "lookup" is called the correct nestingdepth is returned. Since the sort "nestingdepth" is in fact the same as nonnegative-integer, you can assume arithmetic operations. In other words, it's okay to put arithmetic operations in your axioms.

Hint 3: Think LOGICALLY and NOT PROCEDURALLY! Don't view the axioms as code, but rather as statements.

Part II

Code this all up using prolog! Specifically, here is what your project should do.

Your code should look for input in a text file called simply proj2input (no extension). The material in this file is to be read in and processed. Processing this material consists in simulating (or perhaps actually maintaining? Is there a difference?) a symbol table as though a source-code program written in some high-level language is being read in and compiled. The operations on this simulated symbol table are supposed to work according to the axioms that you defined in part I.

The input will consist of a series of input units each of which involves applying one of the operations of part I together with (in some cases) additional information. In the case of "lookup" your program will have to print out the required information. In the case of ALL the input units, your program is required to print out the current state of the symbol table after that input unit has been processed. More information about printing out the symbol table is given later on in these specs.

Output is to be printed to the screen. Make use of "nl" to create a reasonable formatting of the output, for readability.

Your program will maintain only ONE symbol table. In other words, your program will simulate the processing of a symbol table as if ONE source code program is being read in.

Here are details.

The input file will contain a sequence of atoms each followed by a period. Thus, you can read in an atom simply by using the prolog predicate "read". Some prolog systems require white space after each period - I'll make sure that there is white space after each period so that problems are avoided. Also, most prolog systems ignore end-of-lines, etc., so you don't have to worry about what's on a line, etc.

Each input unit (an operation together with any required arguments) will consist of one or more atoms.

The legal operations are the five operations given in Part I, together with a sixth operation called endprogram. How each of these work is described below.

But first: Your code should be able to handle certain illegal situations. If, in some particular case, an error would be returned by an axiom, your program should print out an error message, but your program should NOT exit! Also: when an error is encountered, the current state of the symbol table should NOT be changed. But in addition, other allowable error situations will be given below as we proceed with the specs. Whenever a legal error situation is encountered, your code should print out an error message.

Here is how each operation should be handled.

The first input unit should be simply:

newprogram.

If the first atom seen by your code isn't this one, print out an error message. When your code sees this input unit, it should create a new, empty symbol table. Note, by the way, that this operation does not take any arguments.

Once this input unit is seen, it becomes illegal to process this operation again, so if this atom is encountered as an operation name anywhere again during the run, print out an error message, and, of course, don't change the symbol table in any way.

Once "newprogram" has been processed, your program is simulating the processing of a source file. Here is what your code should do when it sees each of the remaining operations.

If your program sees:

newblock.

it should alter the symbol table in such a way as to indicate that a new block has been entered. Note again that this operation does not take any input arguments. Of course, logically, the current symbol table itself is the argument to this operation, but this is not indicated by any input in the input file.

If your program sees:

declare.

then it should automatically ASSUME that the two atoms that immediately follow this atom represent a user-declared identifier representing a variable and its type. These two atoms should be read in, of course, and this material should be pushed onto the symbol table.

If your program sees:

lookup.

then it should automatically ASSUME that the next atom represents a user-declared identifier that the compiler has encountered in a place other than a declaration. This atom should be read in, the identifier should be looked up on the symbol table, and the type and the nesting depth should be printed out. Use a format that repeats the name of the identifier, such as:

The identifier amount is of type integer and has nesting depth 2.

If the identifier is not found on the symbol table then print out an error message, such as the familiar "undeclared identifier!"

The "lookup" operation does not change the symbol table, of course.

If your program sees:

endblock.

then it should first remove those identifiers whose declarations were seen in the block that is now ending. It should then somehow alter the symbol table to indicate that we are leaving a block and moving up to the block which contains it. This operation does not take any arguments from the input file.

If your program sees:

endprogram.

then it should first make sure that all embedded blocks have been removed! In other words, if any blocks have been entered but haven't been exited, this operation is illegal, and an error message should be printed out. In this case, don't change the symbol table, and continue processing input. However, if all embedded blocks have indeed been removed, then this operation signals that the source program has been read in completely. Therefore, your code should simply exit.

If at any point in the run your program expects to see an operation and it reads in an atom other than one of the six legal ones, then print out an error message such as "no such operation".

The errors that have been listed up to this point in the specs are ALL of the errors that your program needs to be able to handle. Do not bother dealing with other conceivable errors such as strange characters, or input which is not of the form of individual atoms followed each by a period, etc.

A very simple example of an input file with hardly any errors might be:

newprogram.

declare. amount. integer.

declare. balance. float.

newblock.

declare. class. char.

declare. balance. integer.

declare. drivel. string.

lookup. class.

lookup. amount.

lookup. balance.

lookup. drivel.

newblock.

declare. drivel. integer.

declare. evidence. float.

lookup. drivel.

lookup. amount.

lookup. balance.

lookup. class.

endblock.

lookup. amount.

lookup. balance.

lookup. class.

lookup. drivel.

lookup. evidence.

endblock.

lookup. amount.

lookup. balance.

lookup. class.

endprogram.

In this example of input, the arrangement into lines is for human intelligibility. You should ignore this; that is, you don't have to think about this when writing your code. Just use "read" to read in atoms one after another.

To print out the symbol table: since you'll represent the symbol table by means of some prolog structure, you can simply use the "write" predicate to print it out. If you do this, the format of the output will be simply the visible version of the format in which you represent the symbol table. But how should you represent the symbol table? Here are some comments on this.

Important: Do NOT represent the symbol table using prolog lists! In fact, there should be no prolog lists in your project at all.

Instead: Read Section 9.8 in the textbook, but ALSO read and STUDY problem 9.33 in the textbook. You should represent the symbol table using a canonical form based on the initial algebra representation for symbol tables. Question: What should this look like?

Well, examine the representations in problem 9.33; this problem presents a canonical representation for queues. Examine what that representation looks like, and in particular, examine what is found IN those representations. You might also want to think about what is NOT found in those representations! Think about why any queue can be expected to be able to be represented as shown in that problem. Make similar assumptions about symbol tables, using (of course!) the formal description of symbol tables in Part I of this project. Note: You do NOT have to prove that your canonical form is correct! Simply make the obvious guess….

Using the canonical form based on the initial algebra will answer the question you undoubtedly have about what to do to the symbol table when you read in the operation "newblock". I'm sure someone also wants to ask if (or how) nesting depths should be represented - using a canonical form based on the initial algebra answers that question, too. To return nesting-depths, feel free to use the Prolog facilities for arithmetic.

My view is that the preceding five paragraphs actually tell you how to represent the symbol table in your project. One question I won't answer from here on is: "How should we represent the symbol table?" The answer is given to you in the previous five paragraphs!

Since you are writing this project in prolog, your code should reflect the structure of the axioms that you came up with in Part I. You might want to read Chapter 12 of the textbook to refresh your thinking about how to write prolog code. The code that you write doesn't have to look like exactly like your axioms, but SOME part of the code should resemble your axioms significantly.

IMPORTANT: To facilitate running your code by the TA, you are REQUIRED to define a prolog predicate called "go" which takes NO arguments and which, when called, gets your project running. In other words, the top of your code should look like this:

go :- (stuff).

where "stuff" is whatever you write that makes your code work. The idea is that when a user consults a file with your code in it and enters "go.", your code should start executing.

There should be NOTHING about your code that asks for any interactive material from the user except for issuing the "go." call. In other words, your file should be such that if it's consulted and the user types "go.", then input, in the file proj2input, is read in and (correct!) output appears on the screen.

In case you forgot: look up the predefined prolog predicates "see" and "seen" in order to arrange things so that your code AUTOMATICALLY reads input in from the file proj2input.

Your code should all be in ONE file.

Your code should be arranged in a reasonable and sensible way. Any odd code should be commented.

Your code is required to compile and run on the GNU Prolog facility available on the generals. If you are registered for this course, by now you should have access to the generals! If you have a problem, please deal with it AS SOON AS POSSIBLE!

Something you might want to know: You may want to include some sort of delete operation, similar to what is discussed in problem 9.21, as part of your code. If you decide that this is a good idea, you should know that the GNU Prolog system uses the predicate name "delete" for a predefined predicate, which means you cannot use this word! One thing you could do to avoid this problem: simply invent a different spelling for the word "delete"!

You will NOT lose any credit if the GNU compiler issues warnings when compiling your code.

The following information should be included in a comment at the BEGINNING of the file you submit:

Your name

One of your Ids

The full date of submission

The course, which is CSE 340

The semester (which is Spring 2005)

Your section (which is TuTh 10:40)

Indication that this is Project 2

The programming language used (which is GNU Prolog)

The submission procedure will be more or less the same as for the first programming project. Again, you will have to submit your project electronically as well as in hard-copy, so be prepared to do both. Details on submission will come later - check the web site.

As always: we will NOT accept late submissions! (That is, you cannot submit your project late! Late submissions will not be accepted! Do not submit your project late! If you submit your project late, it will not be accepted!) Reminder: The project is due by 1:00 PM on Tuesday April 26, 2005.

© 2005 Leonard M. Faltz

all rights reserved

Reservados todos los derechos