PF11 -- An ANS Forth Implementation for the 68HC11
Andrew Sterian
Padnos School of Engineering
Grand Valley State University
Introduction
This document contains several miscellaneous notes about the PF11
implementation. Please read this file at least once if you are planning
on working with PF11 extensively.
Compilation Process
The compilation of PF11 is complicated. It is complicated because:
- The Forth words defined in the system/ directory must be
compiled into a pre-built dictionary that is ready to run
- The GNU 68HC11 compiler tools must be forced to put things in the
right place depending upon whether ROM-based or RAM-based execution is
desired
- The 512-byte EEPROM causes a "hole" in the memory map which the
compiler tools do not easily handle.
The above complications are dealt with in the following general ways:
- The pre-built dictionary is created by a two-stage build process.
- First, a simulation-only version of PF11 is compiled that
contains the source files in the system/
directory as hard-coded strings in the program.
- The GDB 68HC11 simulator is used to run this program and
construct the in-memory dictionary/code space.
- The contents of the dictionary/code-space memory are dumped out
to C arrays.
- These C arrays are used to build a second version of PF11 that
will be the final version. This version copies the contents of the C
arrays directly to the dictionary/code-space to provide a pre-built
dictionary of words.
The final program is compiled with the
PF11_PREBUILT manifest defined. The simulation-only version has this
manifest left undefined.
- Putting things in the right place depending on a ROM-based or
RAM-based target is controlled by manipulating the sections in which
data reside. The main manifests that control this operation are
PF11_RAM_TARGET, PF11_ROM_TARGET, and TEXTSECT (which is derived from
one of the PF11_XXX_TARGET manifests).
The src/pf_config.h file chooses
between the mutually exclusive PF11_RAM_TARGET and PF11_ROM_TARGET. The
top-level makefile (through config.mk)
sets PF11_ROM_TARGET if a ROM-based target is chosen, otherwise this
manifest is left undefined and src/pf_config.h
then sets PF11_RAM_TARGET.
- The 512-byte EEPROM hole is managed by having the text (i.e.,
program code) split into two sections, "text" and the misnomer "eeprom"
(which actually represents memory locations past the EEPROM up to the
top of memory). Section attributes are used to manually split the code
up into pre-EEPROM and post-EEPROM memory.
By default, all functions are compiled
into the text section ("low memory", before EEPROM). The TEXT2 manifest
is defined in src/pf_config.h
for the ROM-based target to place a function into the eeprom section
(i.e., "high memory", after EEPROM). By manually assigning some
functions to TEXT2, the hole is avoided.
This process is hit-and-miss. If the source code to PF11 changes, some
reshuffling of functions and sections may be necessary.
The easiest solution is to just disable the EEPROM in the 68HC11 CONFIG
register and treat all of memory as contiguous. In this case, the TEXT2
manifest can be left blank.
The following sections describe the above processes in more detail.
Two-Stage Build Process
In the first stage, the "boostrap" program is built in the bootstrap/ directory from the
files in the src/
directory. Prior to doing so, Forth files are stripped (see below) in
the system/ directory and C
source file dependencies are calculated by a "make dep" in the src/ directory.
The Makefile variable MEMX1_FILE is used in the bootstrap phase to
determine the linker file that describes the memory map. This memory map
is fake as it is only used for simulation. However, the key
consideration in constructing this file is that it places the components
to be pre-built (dictionary and code space) in exactly the same memory
locations as the final memory map (second stage). The bootstrap
memory map needs a lot more ROM space (i.e., text section) to store the
Forth system dictionary files (in the system/ directory).
The only difference in code between the bootstrap and final program
versions is in the file src/pf_build.c.
Thus, this file must come last in the list of files to link (and the
Makefile ensures this).
In the bootstrap stage, the src/pf_build.c
file does the following:
- The character array ForthWordsText
is defined as the string concatenation of all stripped Forth definitions
in the system/ directory.
This string (in the TEXT2 "high memory" section) will be sent to the
Forth interpreter to build the dictionary. This process occurs in the LoadForthWordsText() function.
The ForthWordsText string
is very large, approximately 16k, thus the memory map must be
constructed carefully to ensure TEXT2 (i.e., the eeprom section) has
enough space.
- The function pfBuildDictionary()
is the main entry point in this file. This function creates the built-in
words one at a time. Then, it calls LoadForthWordsText() to add the
Forth words defined in the system/
directory.
Once the bootstrap version of PF11 is constructed, it is run under the
GDB simulator. The bootstrap/batch.gdb
file (created by the Python program tools/genbatch.py) contains a
list of commands to GDB to run PF11 and dump the dictionary and code
space area to disk. The process is as follows:
- Load the bootstrap PF11 into the simulator
- Set a breakpoint at the gdb_stop_here()
C function (an empty function that is simply a convenient place to stop
at).
- Run the program until the breakpoint. This breakpoint occurs in pfBuildDictionary(), after the
dictionary has been completely built.
- Execute a variety of GDB commands to dump the dictionary and code
space area to a file named temp.out.
- Run a Python program tools/doprebuild.py
to process the temp.out
file and generate C header files predict.h,preexec.h, and prevars.h.
These three header files are copied to the src/ directory and the bootstrap
process is complete. These three header files are the entire result of
the bootstrap process. The files are used as follows in the second build
stage.
- The predict.h file
contains the contents of the dictionary as one large C string. It is
used in src/pf_core5.c for
a RAM-based target to initialize the td_Dict array, which is the PF11
dictionary. For a ROM-based target, predict.h is used in src/pf_build.c to initialize td_DictPrebuilt, which will be
copied at run-time to td_Dict.
- The preexec.h file
contains the contents of the code space area as one large C string. It
is used in src/pf_core6.c
for a RAM-based target to initialize the td_Exec array, which is the PF11
code space area. For a ROM-based target, preexec.h is used in src/pf_build.c to initialize td_ExecPrebuilt, which will be
copied at run-time to td_Exec.
- The prevars.h file
is used in src/pf_build.c
to initialize some variables like td_DictPtr
(to point to the last dictionary entry), td_ExecPtr (to point to the
first free location in code space), etc. The variables td_Dict_check and td_Exec_check store the starting
addresses of the dictionary and code space area in the bootstrap
program. They are compared against the same addresses in the second
stage program, and of course they must be identical (since pointers to
absolute addresses are used extensively).
The second build stage occurs in the obj/ directory. The compilation
process is the same except -DPF11_PREBUILT
is added to the C compiler flags so that PF11_PREBUILT is a defined
manifest during compilation. Also, the MEMX2_FILE Makefile variable is
used to point to the final program linker configuration file to define
the memory map.
Section Control
Placing data and code in the right sections can often be a challenge
with the GCC tools, especially since both RAM-based and ROM-based
targets must be accomodated. The main sections of interest are as
follows.
- The page0 section
describes low memory, up to 0x00FF. Data that resides in this memory
region is accessed faster by the 68HC11, thus leading to smaller and
faster code. The GCC compiler places several working variables in this
section. PF11 declares the following variables to also reside in this
section, for efficiency:
- td_InsPtr, the
current instruction pointer
- td_StackPtr, the
current stack pointer
- td_ReturnPtr, the
current return stack pointer
- td_DictPtr, the
current pointer to the next free entry in the dictionary
- td_ExecPtr, the
current pointer to the first free location in code space
- various variables in src/sio.c
to implement the serial port driver
The remaining space in this section is
available for custom C code or for converting more PF11 C variables to
reside in this section (by declaring them with the PAGE0 macro...see the src/pf_core.c file for examples).
- The lowdata section
describes memory from 0x100 up to 0xFFF, just before the on-chip special
registers. This memory region is where the C stack resides, beginning at
0xFFF and growing down towards low memory. This section does not have
any other usage in PF11. A 3840-byte stack is quite sizable for a
typical embedded C program, thus you may want to use this section for
other types of storage.
- The text section
contains program code. We also explicitly place some data in the text section when compiling for
a RAM-based target to make the memory map easier to construct (see
below). Note that the word "text" is used to mean two different things:
the name of an actual linker relocatable section, and a region in the memory map. The text
linker section does go into the text memory map region, but other things
do too.
The text memory map region also
contains the rodata linker
section, which represents read-only data like string constants, and
variables declared with the const
keyword. The linker places data in the rodata section right after the text section in the memory map.
The compilation process ensures that the GCC library file crt0.o is linked first, thus the
label _start (where the
whole program begins) resides as the first location in the text section.
The end of the text section is given by the C variable _etext (defined by the linker).
Note that this is the address where the rodata section begins.
- The eeprom linker
section is specially defined to contain program code when compiling for
a ROM target. This section is meant to indicate the "high memory" area
above built-in EEPROM, so that it starts at 0xB800 and extends up to
0xFFFF. The eeprom linker
section is directed to the eeprom
memory map region by the memory map linker control file. This section
name is normally reserved for the actual EEPROM, but we use it instead
for the purpose described above (so we don't have to go mucking into the
internals of the linker command files).
- The data section
contains global variables that are initialized. This section actually
occupies two places in the memory map: the run-time section and the
load-time section. The run-time section is where the "live" variables
reside. These are the addresses that C code actually manipulates when
working with these variables. The load-time section contains the
initialization data for these variables. Before main() is called, built-in code
generated by GCC copies all data from the load-time section to the
run-time section, thus initializing the variables.
The data section has a "load
address" given by the C variable __data_image
(automatically created by the linker) and a runtime address given by the
C variable __data_section_start.
The size of the data
section is given by __data_section_size.
The load address is a place in ROM (should be just beyond the end of
the text section) where the
initialization data is stored. The runtime address is a place in RAM
where the data structure actually resides during program operation.
Thus, the runtime loader automatically copies __data_section_size bytes from __data_image to __data_section_start prior to
calling main().
- The bss section
contains global variables that are not initialized. The contents of this
section are automatically set to 0 by the loader prior to calling main(). The beginning of this
section is given by the C variable __bss_start
and has a size of __bss_size
bytes.
The ROM-based target section usage is described first, as it's easier.
- The stack (td_Stack)
resides in bss.
- The return stack (td_Return)
resides in bss.
- The dictionary (td_Dict)
resides in bss.
- The code space area (td_Exec)
resides in bss.
- The pre-built dictionary (td_DictPrebuilt)
resides in rodata (hence text).
- The pre-built code space area (td_ExecPrebuilt) resides in eeprom.
At run time, the function pfBuildDictionary
(in the second stage program) copies td_DictPrebuilt to td_Dict and td_ExecPrebuilt to td_Exec. One may well wonder
why this scheme is necessary, specifically why td_Dict and td_Exec were not defined to
reside in the data section
and the GCC built-in loader used to initialize them. The answer is that
the linker would not consistently place these in the same location
between the pre-built and final programs. When they are placed in bss, the two-stage bootstrap
process works as desired.
The distinction between rodata
and eeprom for td_DictPrebuilt and td_ExecPrebuilt is simply to
allow the code to fit into both low memory and high memory. There is
nothing special about placing td_ExecPrebuilt
in the eeprom section.
For a RAM-based target, the section usage is as follows.
- The stack (td_Stack)
resides in text.
- The return stack (td_Return)
resides in text.
- The dictionary (td_Dict)
resides in text.
- The code space area (td_Exec)
resides in text.
Note that the dictionary and code space are initialized directly from
the C strings in predict.h
and preexec.h rather than
indirectly through td_DictPrebuilt
and td_ExecPrebuilt (which
are not defined for the RAM-based target).
The initialized structures (dictionary and code area) are placed in text instead of data so that memory is not
wasted. If these structures were declared as normal initialized global
variables (i.e., placed in data)
then there would be both run-time and load-time space allocated for
them, doubling the memory requirements for each structure. Since all
code is going into RAM, placing dictionary and code space in text allows the pre-built
structures to be loaded directly into the 68HC11 and still be mutable by
user code.
As for placing the two stacks in text
instead of bss, the goal
here is to simplify the construction of the memory configuration file.
The linker cannot be told to simply place everything (text, data, bss, the works) into one big
memory area, which would really be the simplest thing to do on a 68HC11,
since it has no virtual memory. Thus, you have to guess ahead of time
where text will end and data begins when constructing
the memory configuration file. By making the data section as small as
possible, the process becomes easier. It is also easier to modify the
size of stacks and recompile without having to modify the memory
configuration file.
If you compile for the RAM-based target and inspect obj/program.map you will notice
that the entire data memory
region only occupies less than 600 bytes, thus making it easy to
configure the memory map.
Stripping Forth
The Python program tools/fthstrip.py
is a simple Forth file stripper. This program strips out all of the
following from a Forth source file:
- Unnecessary whitespace
- Blank lines
- Single-line comments beginning with \
- Comments in parentheses (e.g., " ( a b -- x ) ")
The program understands the following Forth constructs that define
untouchable strings:
- ."
- ABORT"
- S"
- C"
- .( string string
string )
For example, the Forth string 'word1
word2' will be stripped to 'word1
word2', but '."
Hello there"' will be left as-is.
The Forth words ':', 'compile', '[compile]', and 'postpone' cause the next word
to not define an untouchable
string. Thus, 'postpone S"
word1 word2' will indeed strip out
the unnecessary space between word1
and word2. This hack
allows the above words to be defined without indicating an untouchable
string word.
Clearly, this program is easily fallible and was only intended to work
with the Forth files in the system/
directory, to minimize the amount of memory space required during the
bootstrap process. Typing "make" in the system/ directory causes all
*.FT files to be stripped with the results stored in *.FS files. The tools/fthwords.py Python program
then takes all of the *.FS files and generates the "forthwords.h" C
include file for constructing the second-stage program.
Memory Analysis
Typing "make analyze" in the top-level PF11 directory runs the tools/analyzemem.py Python
program. This program reads the obj/program.map
linker map file and parses it to extract memory usage information. The
program reports on how many bytes are used and how many are free in each
memory region. If the MERGE_BUFFALO option is set in the config.mk file (see below) then
the analysis program assumes that BUFFALO occupies memory from 0xE000
to 0xFFFF and adjusts its results accordingly.
Merging BUFFALO
If MERGE_BUFFALO=file.s19
is specified in the config.mk
file and a ROM-based target is chosen, then once the second-stage
program is built, it is converted from ELF format to Motorola S-records,
then the S-records for the BUFFALO monitor (in the specified file.s19 file) are merged in.
This process is performed by the tools/s19merge.py
Python program. The program ignores everything except S1 records (data)
and the S9 record (load address) of the first file specified.
The BUFFALO program buf34x.s19
in the misc/ directory is
provided as a possible option. It comes from the Axiom Manufacturing CMM11E1 support
tools and may not be suitable for other 68HC11 systems. The original
BUFFALO from Motorola can be found by searching the net for buf34.asm and the as11.exe assembler which can be
used to compile it.
If you want to merge a monitor other than BUFFALO it is certainly
possible to do so. Just remember that the S19 file you specify will have
its interrupt vector table used rather than PF11's and it must not
conflict with the memory layout for PF11.
Serial Driver
The serial I/O driver built-in to PF11 is implemented in C in the src/sio.c file. There are two
implementations, selected by the PF11_INTERRUPT_SIO manifest in the
top-level config.h file.
When this manifest is undefined, no interrupts are used and serial I/O
is performed using SCI polling. There is no handshaking in this case.
When PF11_INTERRUPT_SIO is defined, the SCI interrupt is used to
perform interrupt-driven I/O and XON/XOFF handshaking. Some details of
the implementation are described here. The full story, of course, is
told by the code.
The SCI interrupt vector is installed at 0xFFD6 if this location is
writeable (perhaps NVRAM?). If a write to this location fails, then PF11
assumes that something like BUFFALO is present wherein the address at
0xFFD6 points to a secondary vector, i.e., a JMP >XXXX instruction
somewhere in RAM. For example, BUFFALO has 0x00C4 stored at 0xFFD6. At
0x00C4 is a JMP instruction to the actual vector, and since it's in RAM,
it can be changed. Thus, if writing directly to 0xFFD6 fails, PF11 will
overwrite the XXXX in the JMP >XXXX instruction in RAM to its own
vector location.
The receive and transmit buffers are defined as arrays of size
SIO_RX_BUFSIZE and SIO_TX_BUFSIZE, respectively (both configurable in
the top-level config.h
file). As data is received by the interrupt service routine, it is
stored in the circular receive buffer, where it is removed when PF11
calls the input() or inchar() C functions. If the
circular buffer overflows, a brief error message is displayed.
Similarly, when PF11 calls the output()
C function, characters are stored in the transmit buffer after which
they are sent by the interrupt service routine. Thus, output() can be considered to be
non-blocking, except when the transmit buffer is full, in which case output() blocks until all
characters to display are stored in the transmit buffer.
When an XOFF character is received, transmission stops immediately
until an XON character is received. Upon reception, if the receive
circular buffer has SIO_RX_XOFFLEVEL bytes in it (configurable in config.h) then an XOFF
character is transmitted to the host. When the number of bytes in this
circular buffer drops below SIO_RX_XONLEVEL, an XON character is
transmitted to the host. The default values for these levels are 3/4 and
1/4 of the buffer size, respectively.
Note that rewriting the serial driver in assembly language is a good
idea, as it is (most likely...but not certainly) a limiting factor in
how quickly programs can be downloaded to PF11.
Forth Tests
The tests/ subdirectory
contains various Forth files that test the PF11 Forth implementation.
These tests come from various sources. The t_xxx.ft tests all expect that
the t_tools.ft file has
been downloaded first, prior to each
test. The README file in this subdirectory provides a guide for the
various tests.
Note that the fsend.py
program (see below) may be useful (on Unix-ish systems) for running the
tests as it automatically strips the files. For example:
cd tests
../fsend/fsend.py t_tools.ft t_core.ft
The above commands will run the core word set test. Remember that you
must download t_tools.ft
prior to any other t_xxx.ft
test as all of the words defined by t_tools.ft are forgotten by the
actual test code (to save dictionary space).
FSEND
The FSEND program (fsend/fsend.py)
is a simplistic Python program for sending Forth code to a running PF11
session. Terminal programs like minicom and TeraTerm can also send text
files to a target board, however FSEND is useful because:
- FSEND automatically strips Forth source prior to download (unless
disabled with the --no-strip
option) to save download time (see Stripping
Forth above)
- FSEND displays output from PF11 as it is generated, allowing
source errors to be identified faster (some terminal programs also do
this, but not all)
FSEND can also be used for running the Forth test programs, as
described in the previous section.
FSEND is only meant for Unix environments.
For a quick help summary, type "python fsend.py -h" or just "fsend.py
-h" if the fsend.py file
has been given the execute flag.
FSEND automatically opens the I/O devices with XON/XOFF flow control.
If this fails for whatever reason, FSEND can implement "soft" XON/XOFF
flow control, although buffering in the kernel can make this nearly
useless (i.e., by the time that FSEND has received an XOFF, there might
be thousands of bytes already queued up in the kernel). In this case,
implementing a short delay after each transmitted line (called line pacing) is useful. The --line-pacing option can be used
to specify a pacing interval (try 0.05 to begin with and go up if you
get errors, down to make downloads faster).
FSEND is a quick-and-dirty program and could use quite a bit of
improvement.
© 2003, Copyright by Andrew Sterian;
All Rights Reserved. mailto: steriana@claymore.engineer.gvsu.edu