PF11 -- An ANS Forth Implementation for the 68HC11

Andrew Sterian
Padnos School of Engineering
Grand Valley State University


Top-Level | Glossary | Compiler Design | Rationale | Notes
Version 1.0

July 18, 2003
Introduction | Dictionary | Code Space | The User and Return Stacks | PrimitivesControl Structures | Defining Words | Miscellaneous


Introduction

PF11 began life with the pForth source code, but there have been so many modifications to this code base that I long ago gave up on trying to keep track of them all. I will try to summarize the major ideas: There are four primary data structures in PF11:

The Dictionary

The PF11 dictionary consists of dictEntry structures starting at td_Dict in memory. Each dictEntry structure is stored as follows:
Dictionary entry
The first two bytes are a pointer to the address of the "Name Length" byte in the previous dictionary entry (NOTE: this is NOT a pointer to the start of the previous dictEntry but to byte 4 of the previous entry). These two bytes are 0 for the first dictionary entry. The next two bytes are the executable token (XT) for this entry. The XT is either a small integer indicating a built-in primitive, or the address of the code for this word in code space.

The next byte consists of three leading bits plus 5 bits for the length of the word name, thus each word may be up to 31 characters long. The first of the three leading bits is called the precedence bit, but I have no idea what it's used for (it wasn't used in pForth). The second bit marks the word definition as an IMMEDIATE word. The third bit (smudge bit) marks the word definition as invisible (i.e., will not be found when searching the dictionary).

The smudge bit is used when a word is first defined; it is smudged so that it does not find itself when searching the dictionary. The smudge bit is also used by the private.ft file (not part of the core PF11 implementation, but available in the misc/ directory) to mark private words.

The remaining bytes in the dictionary entry contain the word name. There is no null terminator.

The C variable gVarContext points to the "Name Length" byte (i.e., byte 4) of the last entry in the dictionary. This variable is accessible through the built-in Forth word CONTEXT.

The C variable td_Dict contains the starting address of the dictionary, accessible via the HEADERS-BASE Forth word. The td_DictPtr C variable points to the next free location in the dictionary (i.e., the start of a new dictEntry, at byte 0). This variable is accessible via the HEADERS-PTR Forth word. Finally, the HEADERS-LIMIT Forth word points to the first address beyond the end of the allocated space for the dictionary. The Forth word DUNUSED (defined in system/core.ft) uses these pointers to compute how much free space is left in the dictionary.

The Code Space

The code space comprises 16-bit words,  which are either addresses within the code space or ID numbers of built-in primitives. The "Exec Token" field in the dictionary entry of a word is either a 16-bit primitive or the address of a location in code space where that word's definition begins.

All addresses in code space are absolute 16-bit addresses in the 68HC11 address space, not offsets from the start of code space (as in pForth).

All word definitions end with a 16-bit value of 0, which is the ID number of the built in ID_EXIT primitive. The example below shows the Forth definition for the COUNT word (defined in system/core.ft) along with its representation in code space. The word "1+" is the only word in the definition of COUNT that is not a built-in primitive.

: COUNT dup 1+ swap c@ ;

Definition of COUNT word
The C variable td_ExecPtr contains the next available address in code space. The Forth word HERE returns the current value of td_ExecPtr, while the Forth word DP returns the address of td_ExecPtr so that it may be modified.

The C variable td_Exec contains the starting address of the code space, accessible via the CODE-BASE Forth word. The CODE-LIMIT Forth word points to the first address beyond the end of the allocated code space. The Forth word UNUSED (defined in system/core.ft) uses these pointers to compute how much free code space is remaining.

The User and Return Stacks

The user stack is stored in the C array variable td_Stack. The top-of-stack (last-used location) is stored in the td_StackPtr C variable. Note that in the core of the interpret, the top-of-stack is cached in a local variable hence td_StackPtr does not point to the true top-of-stack at all times. The Forth word SP@ returns the contents of td_StackPtr while SP! sets the contents of td_StackPtr.

The return stack has the same structure as the user stack. The actual stack is in the C array variable td_Return and its top-of-stack pointer is stored in the C variable td_ReturnPtr. This latter variable is manipulated with the Forth words RP@ and RP!.

Primitives

The PF11 primitives are built-in Forth words. They are managed as follows:
  • Their integer ID's are set in src/pf_guts.h in the cforth_primitive_ids enumeration
  • They are added to the dictionary in src/pf_build.c in the pfBuildDictionary() function
  • They are implemented in C in the src/pf_inner.c file in the pfExecuteToken() function
In code space, primitives are distinguished from derived words by the fact that they are "small integers". The IsTokenPrimitive() macro defined in src/pf_guts.h checks to see if a 16-bit number is less than NUM_PRIMITIVES, a constant that indicates the number of built-in primitives. Any number bigger than or equal to NUM_PRIMITIVES is assumed to be an address of a derived word definition.

The consequence of this scheme is that word definitions may not reside in low memory, where the address of a definition is low enough to be considered a primitive ID number. There are currently approximately 131 primitives so no Forth word definitions may be stored at addresses below 131 (0x0083). This constraint is highly unlikely to cause problems.

Control Structures

The Forth control structure words (e.g., IF, DO, BEGIN, etc.) are described here with respect to their code space implementation and run-time stack behavior.

Conditional Structures

The Forth words IF, ELSE, THEN, CASE, OF, ENDOF, ENDCASE, and RANGEOF implement conditional structures. Note that RANGEOF is not an ANS standard word. All of these words are implemented in the system/core.ft file.

To understand IF/ELSE/THEN, the following code example is presented.

: test IF word1 ELSE word2 THEN ;

The code space implementation of the test word is as follows.

The 0BRANCH, BRANCH, and EXIT words are built-in primitives. The notation "&word1" represents the 16-bit XT of the word1 Forth word, whatever it may be. Numeric constants are shown as hexadecimal integers.

Note that offsets for branches are specified in bytes, even though code space is 16-bit-word oriented (each cell in the above diagram is a 16-bit word).

To understand CASE/OF/ENDOF/ENDCASE/RANGEOF, the following code example is presented.
    : test      ( n -- )
       CASE
           1 OF word1 ENDOF
           2 OF word2 ENDOF
           3 5 RANGEOF word3 ENDOF
           word4
       ENDCASE
;
The code space implementation of the test word is as follows, shown in Forth code rather than code space words, for brevity.

	: test  ( n -- )
1 OVER = IF DROP ( OF )
word1
ELSE ( ENDOF )
2 OVER = IF DROP ( OF )
word2
ELSE ( ENDOF )
3 5 (RANGEOF?) IF DROP ( RANGEOF )
word3
ELSE ( ENDOF )
word4
DROP THEN THEN THEN ( ENDCASE )
;
The implementation of the CASE structure uses variables CASE-DEPTH and OF-DEPTH at compile time to expand the structure to its primitive components.

Indefinite Looping Structures

The Forth words BEGIN, AGAIN, UNTIL, WHILE, REPEAT implement indefinite looping structures. All of these words are implemented in the system/core.ft file.

To understand the basic BEGIN-AGAIN infinite loop, the following code example is presented.
	: test	BEGIN word1 AGAIN ;
The code space implementation of the test word is as follows:
BEGIN-AGAIN
The BEGIN-UNTIL loop is demonstrated as follows.
	: test BEGIN word1 UNTIL ;
The code space implementation is as follows:
BEGIN-UNTIL
The only difference between BEGIN-UNTIL and BEGIN-AGAIN is the use of the 0BRANCH primitive instead of the BRANCH primitive.

Finally, the BEGIN-WHILE-REPEAT loop is demonstrated as follows.
	: test BEGIN word1 WHILE word2 REPEAT ;
The code space implementation is as follows:
BEGIN-WHILE-REPEAT

Definite Looping Structures

The Forth words DO, ?DO, LOOP, +LOOP, UNLOOP, and LEAVE implement definite looping structures. All of these words are implemented in the system/core.ft file.

The implementation of these looping structures is best explained by reading the code, as simple diagrams do not suffice to capture the complexity.

Defining Words

The Forth words VARIABLE, CONSTANT, VALUE, DEFER, CREATE, DOES> define new Forth words (in addition to the standard colon word).

VARIABLE

The effect of VARIABLE is demonstrated as follows.
	VARIABLE test
The word test is added to the dictionary. Its code space implementation is as follows.
Code space implementation of a variable

The ID_CREATE_P built-in primitive pushes the address of the 3rd cell onto the stack. This cell, initially 0, is the storage location for the variable. The necessity of having two EXIT primitives is due to the implementation of CONSTANT, described below.

CONSTANT

The effect of CONSTANT is demonstrated as follows.
	9 CONSTANT test
The word test is added to the dictionary. Its code space implementation is as follows.
Code space implementation of a constant
The ID_CREATE_P built-in primitive pushes the address of the constant value on the stack, then essentially a "@" word is called to retrieve the contents of this address. Note that this implementation is quite a bit less efficient than just coding a constant into a word definition, which only requires two words (ID_LITERAL_P and the constant itself).

VALUE

The effect of VALUE is demonstrated as follows.
	9 VALUE test
The word test is added to the dictionary as an IMMEDIATE word. Its code space implementation is the same as that of CONSTANT described above, except the word called as part of the "body" (i.e., the word after ID_CREATE_P) is different. The body that is executed at run time is (after the address of the data cell, 0x0009 in this example, is pushed on the stack by ID_CREATE_P):
	STATE @
IF
[compile] LITERAL
compile @
ELSE
@
THEN
The TO Forth word simply changes the 4th word of the code space implementation from its current value to a new value.

DEFER

The effect of DEFER is demonstrated as follows.
	DEFER test
The word test is added to the dictionary.  Its code space implementation is as follows.
Code space implementation of a deferred word
Note that the ID_DEFER_P word does nothing. The default implementation of the new test word, ID_QUIT_P, is essentially an ABORT function.

The IS word changes the implementation of a deferred word by changing ID_QUIT_P to the execution token of the specified implementation word.

CREATE

The effect of CREATE is demonstrated as follows.
	CREATE test
The word test is added to the dictionary. Its code space implementation is as follows.
Code space implementation of CREATE

DOES>

The DOES> word is used after CREATE has created a new word as part of the body of a defining word (like CONSTANT). In general, the code space implementation of a defining word that contains DOES> will look like this:
Code space implementation of DOES>
For example, the definition of CONSTANT is defined in system/core.ft to be:
	: CONSTANT CREATE , DOES> @ ;
The code space implementation of CONSTANT, therefore, is:
Code space implementation of CONSTANT
Note that the above is the implementation of the CONSTANT word, not of a word defined by CONSTANT (which was already demonstrated above).

Miscellaneous

Here are some miscellaneous notes on the implementation.
PF11, however, does not cache the stack pointer. Thus, words that contain calls to DOES> would lead to a stack-level mismatch. By first calling the NO;CHECK word, this stack-level mismatch check is suppressed once. It's a bit klugy, but it seems to work.

Note that this error is exhibited in the original pForth if a trace level of 3 is enabled, as the stack pointer is uncached prior to printing a trace. Try the following in pForth:
	3 trace-level !
: CONSTANT CREATE , DOES> @ ;
( stack trace omitted )
Error in ffSemiColon - stack depth changed between : and ; . Probably unbalanced conditional
		hex 5678 1234
Natively, however, double-word numbers are stored MSB-first as that is the Motorola 68HC11 convention.

© 2003, Copyright by Andrew Sterian; All Rights Reserved. mailto: steriana@claymore.engineer.gvsu.edu