|Table of Contents|
However, there are many beasts in the architectural jungle that are bit addressed or cell addressed, or prefer 32-bit operations, or represent numbers in one's complement. Since one of Forth's strengths is its usefulness in strange environments on unusual hardware with peculiar features, it is important that a Standard Forth run on these machines too.
A primary goal of the ANS Forth Standard is to increase the types of machines that can support a Standard Forth. This is accomplished by allowing some key Forth terms to be implementation-defined (e.g., how big is a cell?) and by providing Forth operators (words) that conceal the implementation. This frees the implementor to produce the Forth system that most effectively utilizes the native hardware. The machine independent operators, together with some programmer discipline, enable a programmer to write Forth programs that work on a wide variety of machines.
The remainder of this Annex provides guidelines for writing portable ANS Forth programs. The first section describes ways to make a program hardware independent. It is difficult for someone familiar with only one machine architecture to imagine the problems caused by transporting programs between dissimilar machines. Consequently, examples of specific architectures with their respective problems are given. The second section describes assumptions about Forth implementations that many programmers make, but can't be relied upon in a portable program.
The cell is the fundamental data type of a Forth system. A cell can be a single-cell integer or a memory address. Forth's parameter and return stacks are stacks of cells. Forth-83 specifies that a cell is 16-bits. In ANS Forth the size of a cell is an implementation-defined number of address units. Thus, an ANS Forth implemented on a 16-bit microprocessor could use a 16-bit cell and an implementation on a 32-bit machine could use a 32-bit cell. Also 18-bit machines, 36-bit machines, etc., could support ANS Forth systems with 18 or 36-bit cells respectively. In all of these systems, DUP does the same thing: it duplicates the top of the data stack. ! (store) behaves consistently too: given two cells on the data stack it stores the second cell in the memory location designated by the top cell.
Similarly, the definition of a character has been generalized to be an implementation-defined number of address units (but at least eight bits). This removes the need for a Forth implementor to provide 8-bit characters on processors where it is inappropriate. For example, on an 18-bit machine with a 9-bit address unit, a 9-bit character would be most convenient. Since, by definition, you can't address anything smaller than an address unit, a character must be at least as big as an address unit. This will result in big characters on machines with large address units. An example is a 16-bit cell addressed machine where a 16-bit character makes the most sense.
: ARRAY CREATE 2* ALLOT DOES> SWAP 2* + ;
Use of 2* to scale the array index assumes byte addressing and 16-bit cells again. As in the example above, different versions of the code would be needed for different machines. ANS Forth provides a portable scaling operator named CELLS. Given a number n, CELLS returns the number of address units needed to hold n cells. A portable definition of array is:
: ARRAY CREATE CELLS ALLOT DOES> SWAP CELLS + ;
There are also portability problems with addressing arrays of characters. In Forth-83 (and in the most common ANS Forth implementations), the size of a character will equal the size of an address unit. Consequently addresses of successive characters in memory can be found using 1+ and scaling indices into a character array is a no-op (i.e., 1 *). However, there are cases where a character is larger than an address unit. Examples include (1) systems with small address units (e.g., bit- and nibble-addressed systems), and (2) systems with large character sets (e.g., 16-bit characters on a byte-addressed machine). CHAR+ and CHARS operators, analogous to CELL+ and CELLS are available to allow maximum portability.
ANS Forth generalizes the definition of some Forth words that operate on chunks of memory to use address units. One example is ALLOT. By prefixing ALLOT with the appropriate scaling operator (CELLS, CHARS, etc.), space for any desired data structure can be allocated (see definition of array above). For example:
CREATE ABUFFER 5 CHARS ALLOT ( allot 5 character buffer)
The memory-block-move word also uses address units:
source destination 8 CELLS MOVE ( move 8 cells)
One of the most common problems caused by alignment restrictions is in creating tables containing both characters and cells. When , (comma) or C, is used to initialize a table, data is stored at the data-space pointer. Consequently, it must be suitably aligned. For example, a non-portable table definition would be:
CREATE ATABLE 1 C, X , 2 C, Y ,
On a machine that restricts 16-bit fetches to even addresses, CREATE would leave the data space pointer at an even address, the 1 C, would make the data space pointer odd, and , (comma) would violate the address restriction by storing X at an odd address. A portable way to create the table is:
CREATE ATABLE 1 C, ALIGN X , 2 C, ALIGN Y ,
ALIGN adjusts the data space pointer to the first aligned address greater than or equal to its current address. An aligned address is suitable for storing or fetching characters, cells, cell pairs, or double-cell numbers.
After initializing the table, we would also like to read values from the table. For example, assume we want to fetch the first cell, X, from the table. ATABLE CHAR+ gives the address of the first thing after the character. However this may not be the address of X since we aligned the dictionary pointer between the C, and the ,. The portable way to get the address of X is:
ATABLE CHAR+ ALIGNED
ALIGNED adjusts the address on top of the stack to the first aligned address greater than or equal to its current value.
VARIABLE FOO HEX 1234 FOO ! FOO C@
The same code on a 16-bit 68000 big endian Forth would produce the answer 12 (hex). A portable program cannot exploit the representation of a number in memory.
A related issue is the representation of cell pairs and double-cell numbers in memory. When a cell pair is moved from the stack to memory with 2!, the cell that was on top of the stack is placed at the lower memory address. It is useful and reasonable to manipulate the individual cells when they are in memory.
Programmers who have grown up on two's complement machines tend to become intimate with their representation of numbers and take some properties of that representation for granted. For example, a trick to find the remainder of a number divided by a power of two is to mask off some bits with AND. A common application of this trick is to test a number for oddness using 1 AND. However, this will not work on a one's complement machine if the number is negative (a portable technique is 2 MOD).
The remainder of this section is a (non-exhaustive) list of things to watch for when portability between machines with binary representations other than two's complement is desired.
To convert a single-cell number to a double-cell number, ANS Forth provides the operator S>D. To convert a double-cell number to single-cell, Forth programmers have traditionally used DROP. However, this trick doesn't work on sign-magnitude machines. For portability a D>S operator is available. Converting an unsigned single-cell number to a double-cell number can be done portably by pushing a zero on the stack.
Only words defined with CREATE or with other defining words that call CREATE have data fields. The other defining words in the Standard (VARIABLE, CONSTANT, :, etc.) might not be implemented with CREATE. Consequently, a Standard Program must assume that words defined by VARIABLE, CONSTANT, : , etc., may have no data fields. There is no way for a Standard Program to modify the value of a constant or to change the meaning of a colon definition. The DOES> part of a defining word operates on a data field. Since only CREATEd words have data fields, DOES> can only be paired with CREATE or words that call CREATE.
In ANS Forth, FIND, ['] and ' (tick) return an unspecified entity called an execution token. There are only a few things that may be done with an execution token. The token may be passed to EXECUTE to execute the word ticked or compiled into the current definition with COMPILE,. The token can also be stored in a variable and used later. Finally, if the word ticked was defined via CREATE, >BODY converts the execution token into the word's data-field address.
One thing that definitely cannot be done with an execution token is use ! or , to store it into the object code of a Forth definition. This technique is sometimes used in implementations where the object code is a list of addresses (threaded code) and an execution token is also an address. However, ANS Forth permits native code implementations where this will not work.
A Standard Program may use the return stack directly only for temporarily storing values. Every value examined or removed from the return stack using R@, R>, or 2R> must have been put on the stack explicitly using >R or 2>R. Even this must be done carefully since the system may use the return stack to hold return addresses and loop-control parameters. Section 188.8.131.52 Return stack of the Standard has a list of restrictions.
Programs designed for ROMed application must divide data space into at least two parts: a writeable and readable uninitialized part, called RAM, and a read-only initialized part, called ROM. A third possibility, a writeable and readable initialized part, normally called initialized RAM, is not addressed by this discipline. A Standard Program must explicitly initialize the RAM data space as needed.
The separation of data space into RAM and ROM is meaningful only during the generation of the ROMed program. If the ROMed program is itself a standard development system, it has the same taxonomy as an ordinary RAM-only system.
The words affected by conversion from a RAM-only to a mixed RAM and ROM environment are:
, (comma) ALIGN ALIGNED ALLOT C, CREATE HERE UNUSED
(VARIABLE always accesses the RAM data space.)
With the exception of , (comma) and C, these words are meaningful in both RAM and ROM data space.
To select the data space, these words could be preceded by selectors RAM and ROM. For example:
ROM CREATE ONES 32 ALLOT ONES 32 1 FILL RAM
would create a table of ones in the ROM data space. The storage of data into RAM data space when generating a program for ROM would be an ambiguous condition.
A straightforward implementation of these selectors would maintain separate address counters for each space. A counter value would be returned by HERE and altered by , (comma), C,, ALIGN, and ALLOT, with RAM and ROM simply selecting the appropriate address counter. This technique could be extended to additional partitions of the data space.