Detailed Explanation of p077



note000
Welcome to p077.c source code!
This explanation will be added over the time and will be made ideally helpful in reading and understanding the source code.
p077 is a version of the GCS programs. The first and simpler version was p068. The names are just for distinguishing each other. The first version used a simplified C grammar. The second version, p073, accepts full C grammar. p077 is the third version that implements IPC. This explanation does not yet contain explanations of IPC. I will add them sometime later.
You are now at the end of #-directives and before the global declaration. The whole flow of this program is simple. It accepts a C source program that is fed from the user. Then it examines the program and convert the program into a tree structure. Then finally, it traverses the tree and does required tasks at each tree nodes. This main flow is containd in the main() program.
In order user programs not to cause a runaway situation, precautionary limits are set at various places in the source code. MAXINP, MAXNOD and MAXNMP are part of such limits. MAXNOD limits the number of nodes to be traversed. This will prevent infinite loops. There should be a mechanism to override this limit in order to do a serious long computations. I will add that functionality later on.
Some of the functions are in the separate files, such as draw.c and gcs1.c. Those functions are for use by client source programs and goes through an interface function calld 'func'. Its argument is a pointer to a void and accepts a combination of various data types. 'func' is defined at the last part of p077.c.
Then follows several struct definitions. I am using structs sparingly, and am not using enumerations in order not to clutter with so many strange words.
entry is a combination of name and its value, and is used in processing 'post'ed data through CGI.
darray and iarray are for containing array data, each for doubles and ints.
memory and scope are used as containers of declared variables. Scope is a hierarchial memory area. Global scope is the global memory area. Main scope is the main program memory area. A function scope is created dynamically when a function is called.


note001
The global variable firstTime is used for printing the content type only at the first time. The content type is a header of the document to be returned to the client. There are error traps in various places of the codes. An error message is returned to the client at those traps. Instead of placing codes for checking firstTime at each of those places, the function contype is calld at those many places.
The functions getword to send_fd are for processing CGI input and I have ported them from the NCSA site. I may as well streamline them sometime later.


note002
We are in the middle of declarations of the main() function. The array definitions are following.
Explanations of major variables are as follows.
entries[] are CGI input entries. The client program passed from the text area field is an entry. If there are input field data, they each occupy entries.
The client source program is first saved in inp[] character array. Then it is converted to a series of terminals and put into array tmnl[]. The process of the conversion is called lexer.
The lexer is followed by the parser. Parser uses a stack stk[].
param[] array is used for parameter passing to the functions.
mpstk[] that stands for the multi-purpose-stack is used during execution of the program.
jmpcnt[] is used for jump control in the if-else statements or for loops.
mem[] is variable-memory-area stack. mem[0] and mem[1] are used for global variables and main variables. When a user function is called, the corrsponding local variables get an entry in mem[], and when the control exits the function, the entry is released.
The arrays from lex1[] to lex15[] are used by the lexer processing.


note003
pars1[] to pars7[] are collectively called parse tables. These tables are generated from
the C grammar. The grammar was taken from the book "The C Programming Language Second Edition", Appendix A13. Each lines of the grammar are called productions. It consists of a symbol followed by a -> and they are followed by one or more symbols. We call the symbols as grammar symbols. A left-hand side of -> can be expanded as the right-hand side, and, conversely, a righ-hand side can be reduced to the left-hand side. Each production is preceeded by a number, which we call production number.
We pick up these grammar symbols and show them with the symbol number. The first part of the symbols that are coloed in blue are called terminals, and the rest of the symbols in black are called non-terminals. Non-terminals appear in the left-hand sides of the productions.
We make an another data from the grammar. It defines the states, numbered by state numbers. A state is linked to the several items. These items are productions with a # mark somewhere in the right-hand side. You may find out what that state thing mean by closely looking at the items, but that is not important for now. You may think of a state number just as a magic number.
The parse tables are produced solely from the grammar. I will add the explanations later.


note004
Tables related to the functions are from prs6[] to prs11[]. These data correspond to function protopypes.


note005
This is the beginning of executable statements.
I used a compiler switch CGI to generate a CGI version or a stand-alone version. The stand-alone version is used during development mainly for debugging. The CGI version accepts the input from stdin and generate the output to stdout. The stand-alone version uses a local file named test.c as input and writes output to the console or to an output file in the case of graphics.
In the case of CGI, form field name is put into entries[].name[] and form field data into entries[].val[]. The C program source code is in the field named 'source' and the compute button is in the field named 'code'. The fileds with other names can be substitution literals. The data in those form fields are used to replace keywords in the source program. Here the keywords are given by the form field names.


note006
This is the start within the while loop.
At each loop, a character is fed from entries[].val[] in the case of CGI or from the file test.c in the case of stand-alone mode. Here, the input characters, excluding comment fields, are counted and saved in the variable ninp.
After defining ninp, a memory area of that number of characters is allocated to the input character buffer called inp[].
Then an another loop follows to put the input source code, excluding comments, to the inp[].


note007
We have just replaced tabs, line-feeds, and such to spaces. Line-feeds are varied with the type of client's computer.
Next, we are going to delete extra spaces from inp[], and the resulting number of charaters is saved in the variable ninp.


note008
We have deleted comments, control characters and extra spaces from the user source program. We are now ready to convert the source programs to a series of terminals. The process of lexer starts here.


note009
The process of lexer ends here. I will add detailed explanations sometime later.
One thing I would like to note is that we added the last terminal numbered 151 at the end of terminal's array tmnl[].

Next step is to save constants in the input to respective places.


note010
Now the etrminals are put in tmnl[] and constants are put into respective places. The memory area where the input source text resided have been released.
Next we will construct the parse tree. Six arrays, gsymbol[], produc[], parent[], child[], sibli[] and attri1[] are allocated memory areas. These arrays are indexed by node number of the parse tree.
gsymbol[] is the grammar symbol of a node. It may be either a terminal or a non-terminal.
If it is a non-terminal, there must be an acompanying production that has the non-terminal on its left-hand side. The production number is saved in produc[].
The left-hand side of a production is called the parent, and the corresponding members in the right-hand side of the production are calld children. If we put the parent toward the root side and put the children toward the leaves, and connect the parent and each of the children with a line segment, we can draw a tree. The tree is called the parse tree.
Each node has a parent[]. If a node has children, the eldest or the left-most child is saved in child[]. If a child has young brothers, the immediately right brother is saved in sibli[].
attri1[] is the attribute of the node. It can be used for various purposes.


note011
This is the biginning of the loop of the parsr. The parser looks at the input terminal array value one by one, and constructs the parse tree.
The preceding definitions of variables o and o1 are used at the binary searches of the parse table.
The next statement looks at the top of the stack stk[i]. The stack contains a series of a combination of a grammar symbol number and a state number. stk[i] contains a state number, and the statement tells that if the state number is greater than or equal to 200, the action to be taken is 'reduce' and the production to be used when reducing is an element of an parse table.
There are two actions, shift and reduce. Shift is to put the new terminal, and a state number on top of the stack. Reduce by a production means to remove the paires from the stack and put a pair on to the stack. Here the number of removed pairs is the same as the number of the grammar symbols of the right-hand side of the production. Reducing means replacing the right-hand side of the production by the left-hand side of the production.
I will add more about parsing later.
If the state at the top of stack is below 200, the state and the next terminal are combined to define a new number k, and search of the element of the array parse1[] whose value is k. When the element is found, its array index is saved in m.
From the m and parse2[], we obtain k and l. If k is 0, the action is shift, and l means the new state. If k is 2, the action is reduce, and the production used when reducing is in l.
If k is 3, the action is accept, which means all the input terminals were successfully converted to a parse tree. k is not going to be 1.


note012
Shift put the terminal in gsymbol. Production, child and sibling are filled with 0.
Attri1[] is the attribute related to the node ngsymb. It is used to save the int value that is contained in tmnl[] next to the place where the terminal number is contained.
Shift means that the input terminal be shifted simply into the stack. Instead of moving the terminal itself onto the stack, node number ngsymb is pushed onto the stack. Then the new state l is pushed onto the stack.


note013
Reduction causes right hand side of the production to be poped off the stack.
prs5[l-1] is the production offset. prs4[] is the corresponding grammar symbol.
Do for each child, set parent and sibling.
Push left hand side of the production and new state.

l is production number. l1 is 0 if the child is terminal. Here, the gsymbol[ngsymb-1] is deleted and overwritten by gsymbol[ngsymb].


note014
The parse tree tends to contain nodes which has only one parent and only one child. They amount to a large number and make the size of the tree very big.
We delete here such nodes and keep the tree compact. This reduces the amount of stack data when traversing the tree.


note015
In the test mode or non-CGI mode, the parse tree is saved in the local file cbnf12. After some further processing the parse tree can be transformed into a visial format. An
example corresponds to the recursion example.


note016
After creating the parse tree, we examine it to identify how many functions are there. There should at least one function called main. We define the number nscop as the number of functions plus one.
Variables can be declared in each of the scopes. We say the variables which are declared outside of any of the functions are in the global scope. The global scope is the special scope and we put it in scop[0]. The other scopes are put in the scop[] array in the order they appear in the client source program.


note017
For each of the scopes, we process variable declaration data and put them in scop[] structs and in the associated memory structs.


note018
Start of the argument processing


note019
Return node of the argument processing.


note020
Start of the declaration processing.


note021
Return node of the declaration processing.


note022
This is the position where the actual computational operation begins,
We have just set mem[] data, which are used in handling the memory area reserved by the declaration statements of the client source program.
The
memory management is the key to the interpreter program.
Execution of the program is done in the stack-machine fashion. I called the stack as multi-purpose-stack and used the symbol mpstk[] for it. nmp is the stack pointer. The pointer is post incremented so that the position nmp-1 and below it contain the meaning data.


note023
In the depth-first traversal, inner nodes are going to be visited twice, the first on the way out and the second on the way back.
At the first visit, the node number is pushed on to the stack. On the second visit, we can identify this is surly the second visit by inspecting the content of the stack.
We look at the stacks from the position nmp-1 to the position 0. Each stack has two integer data. Generally the first data indicates the kind of data. It is set to 0 and the second data to the node number at the first visit. If the same node number as the current node is found, the control flow breaks out of the loop with some value of i other than 0.
When the first value of the stack is 10, it means that the control jumped to some client-defined function there. The if statement within the for loop is designed so that the stacks belonging other functions should not be inspexted. It prevents erroneous control when recursive calls are made. The exception is when the control is returning from the client-defined function,


note024
In the depth-first traversal, inner nodes are going to be visited twice, the first on the way out and the second on the way back.
At the first visit, the node number is pushed on to the stack. On the second visit, we can identify this is surly the second visit by inspecting the content of the stack.
The data on the stack which are above the stack position containing the node number are the data that are collected since the first visit. Those data are likely to be processed by the return node.
If this is the case, at the end of the process, the new stack pointer would point to the position where the node number is pushed.
This is why the stack pointer nmp is replaced by the node number position i, at the beginning of the process.
Inner nodes where the processes are carried out correspond to grammar productions.
The processing branches according to the production number.


note025
Substitutions are carried out here.


note026
Comparisons are carried out here. The result in the integer value either 1 (true) or 0 (false) is pushed onto the stack.


note027
Simple arithmatic operations are carried out here.


note028
As can be seen from the production, there are 6 unary operators. Only two of them are currently processed here.


note029
This is the interface plane of the functions. When the client program calls a function, the control arrives here. There are two kinds of functions, client-defined functions and built-in functions. Both are processed here, but with the different process flow.
Control Flow of Function Calls


note030
The default action here is to propagate stack data.
The stack data between i1+1 and j1-1 inclusive are shifted one slot down so that they can be used at some node later in the traversal.
nmp which has tentatively set to the value of i1 is redefined by the new value.


note031
The identifier param is an array of pointers. Each members of the array are different type of pointers. The first element is a pointer to an array of integers. If we denote the first element of the array param as param1, we can store and retrieve integers by param1[]. We put the basic properties of the function and the passed parameters there.
The second and the following param's are used to store the pointers to the variables to be passed to or returned from the function.
The built-in functions and the functions the clients use in their client source programs do not necessarily have to have the same function prototype.
However, I kept them of the same appearance so that it is easy for the maintenance of the system, and it gives clients the impression as though they are directly calling the built-in functions from their client source.
This is the place where that trancparency is produced.
The built-in functions are either compiler-built-in functions or interpreter-built-in functions. There is no apparent distinction between them for the clients.