High order data structures V1.9 This explains the workings and rationale of a scheme for high order data structures. These are the kind of data structures that are roughly equivalent to C/C++ structs or Pascal/Ada Records, FORTRAN has no such construct (Fortran 90 does have such a facility with its TYPE construct). This version of data structures is based upon ideas described in Hayes (1992), Pountain (1987), Ertl (1994), and Hendrix (1994) and upon much experimentation with several earlier versions of this code. These data structures allow the creation of several kinds of data that are naturally associated with each other, but do not necessarily work well together as a single array of data. This implementation of data structures was designed with the following goals in mind: 1. Portability -- the implementation must be ANS Forth. 2. Generality -- the data structures need to be used in many kinds of applications. 3. Readability -- the syntax to be created has to be easy to read and relatively intuitive. 4. Easy to use -- an awkward design will not likely get used. 5. Works with multiple instances of a given data structure. 6. Allows nested data structures. 7. Allows arrays of data structures (both statically and dynamically declared) without adding additional syntax to existing array words. The intent of the design is to provide a general, easy to use scheme for handling data only. The temptation to add object oriented properties to the structures was avoided (a possible future data structure could be designed for objects). The structure contains ATTRIBUTE subfields, that is data or pointers to data. When using a data structure the first step is to declare the type, e.g, structure pixel integer: ->red integer: ->green integer: ->blue float: ->intensity endstructure defines a structure named pixel that has 3 INTEGER attribute fields, (named ->red, ->green and ->blue) plus a floating point field (named ->intensity). An alternative form of the type declaration uses the word ATTRIBUTE with the existing type size words, structure pixel integer attribute ->red integer attribute ->green integer attribute ->blue real*4 attribute ->intensity endstructure The words, ->red, ->green, ->blue and ->intensity will be created by these declarations. They will automatically know their offsets into the data structure. Internally the space for the attributes occurs in the order that they are declared. Note that depending upon the data structure being declared, alignment words (ALIGNED, FALIGNED, etc.) may be necessary between the specification of the attributes. An instance of a pixel is declared the same way as declaring a variable, pixel pix1 declares a pixel instance called pix1. The key to this simplicity is the fact that 'structure' is a second order defining word that tells the compiler how a new type is to be defined. Structures can be nested, for example: structure point integer: .x integer: .y sizeof pixel struct: .px endstructure point p (see below for explanation of 'sizeof' ). Then the following can be used, p .x \ the address of the .x attribute of p p .px ->red \ the address of the ->red attribute of p's .px =================================================================== USING A STRUCTURE When a structure name is invoked alone, it stacks a double structure handle. This handle consists of two components each occupying one cell: the base address of the structure instance on the top of the stack, and the type-id for this structure below it on the stack. The address of an attribute field of a given structure instance is obtained by naming the object and then the attribute, e.g. pix1 ->red gets the address of the ->red attribute of pix1. The attribute is late bound to the structure so that the following kinds of operations will work, : clear_red ( addr -- ) 0 SWAP ->red ! ; pix1 clear_red The structures and attribute fields use the type id's that are part of the structure handle to assure that a member attribute goes with a proper structure. So, for example, pix1 .x will give a RUNTIME failure (this happens at runtime not compile time beause the attribute fields use late binding). Because the structure handles contain the type information, a structure member can be dereferenced well after the structure is referred to. For example, p \ the point structure above ( .....a bunch of stuff with no net stack effect that involves, other kinds of structures ... ) .x \ the user WANTS p .x at this point This will type of code will work OK. There is a STRUCT-HANDLE type defined that is useful for storing away structure contexts, structure STRUCT-HANDLE 1 CELLS attribute .type 1 CELLS attribute .addr endstructure With this one can store away a structure reference for later use, e.g., STRUCT-HANDLE SELF pix1 SELF .addr ! SELF .type ! In order to reduce the possiblity of errors when doing this, the words h@ and h! are provided to move values in and out of this structure, pix1 SELF h! ... SELF h@ h@ ( hdl1 -- hdl2 ) Fetches the (double) structure handle stored in a STRUCT-HANDLE instance, hdl1. h! ( hdl1 hdl2 -- ) Stores the (double) structure handle hdl1 in the STRUCT-HANDLE instance hdl2. =================================================================== CREATING ARRAYS OF STRUCTURES. Arrays of structures can use the existing ARRAY/DARRAY mechanism. To do this the new word 'sizeof' is needed, sizeof ( -- ncells ) This immediate word expects the name of a structure type (NOT AN INSTANCE) and returns the number of cells the data part of the structure occupies. For example, sizeof pixel and NOT, sizeof pix1 NOTE: The word 'sizeof' has a side effect of setting the VALUE TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY? to TRUE. These are needed in order to properly construct arrays of structures and nested structures. The result of the 'sizeof' word is used for the first parameter of ARRAY (or third parameter of DARRAY) declarations, 5 sizeof pixel ARRAY p{ or, sizeof pixel DARRAY q{ & q{ 5 }malloc These create 5 element arrays where each element has the same type as the structure pixel, the names of the arrays are p{ and q{. The address of a particular element is obtained in the usual way, e.g., p{ 2 } \ stacks the address of element 2 p{ 2 } ->red \ stacks the address of the ->red attribute of \ element 2 =================================================================== Defining structures that contain arrays To define a structure that contain a STATIC array (i.e. fixed in size a compile time), use the word ARRAY: similar to how ARRAY would be used outside of a structure, structure databuffer integer: .index 20 integer array: .z{ endstructure With the above defined, every databuffer instance will have an integer index and a 20 element integer array. The word ARRAY: can be applied to structures, just as the word ARRAY can, to create structures that contain arrays of other structures. To create structures with DYNAMICALLY allocated arrays, one must use the word POINTER: to create a structure pointer and then use some support words to associate the pointer with the desired array. (The word POINTER: is actually general purpose, it can be used for any type of pointer). To declare a structure that contains pointers, \ a data structure for LU factored matrices structure LUMATRIX REAL*4 pointer: .MATRIX{{ INTEGER pointer: .PIVOT{ INTEGER attribute .N endstructure Then one needs to create a proper matrix and pivot (either statically or dynamically) and then have the LUMATRIX instance point to them, INTEGER DARRAY itmp{ REAL*4 DMATRIX mtmp{{ LUMATRIX a \ defines an LU Matrix structure instance ... & itmp{ 5 }malloc \ allocate data space & mtmp{{ 5 5 }}malloc itmp{ a -> .pivot{ ->! \ a's pivot now points to itmp{ mtmp{{ a -> .matrix{{ ->! \ and to mtmp{{ Once this is done itmp{ and mtmp{{ can be used for other purposes as long as the originally allocated space is not affected. To free the allocated space, one needs to reverse the process, a .pivot{ & itmp{ &! \ itmp{ now points to the pivot & itmp{ }free \ releases the space The words used for this are, -> ( s-id s-addr -- ptr-addr ) Gets the base address of a pointer instance. An example of its usage: buf -> .z{ ->! ( addr ptr-addr -- ) Make the pointer (ptr-addr) refer to the array base address (addr - CELL). IMPORTANT NOTE: If a{ is an array, and B is an instance of a structure that has the pointer attribute .a{, then the sequence, a{ B -> .a{ ->! has a ZERO net stack effect if a{ is an array of simple scalar types. BUT has the effect of leaving the TYPE-ID of a{ on the stack if a{ is an array of structures. The same applies to the reverse sequence that aliases an external array to a pointer in the structure, B .a{ & a{ &! =================================================================== Defining CONSTANT structures To define a constant version of a data structure one needs to have already defined the word that fetches data from the data structure. One has to also define the word that does the equivalent of ',' for that data strucuture (i.e. it allocates space for and initializes a given data structure at HERE). Then the word 'constant-structure' is used in the constant defining word before the pre-existing defining word is invoked, : PIX-CONSTANT ['] pix@ ['] pix, constant-structure pixel ; If 'pix,' had the stack diagram ( red green blue -- ) ( F: inten -- ) then a constant pixel could be declared as, 20 30 40 0.75e0 pix-constant PIX Then subsequent invocations of 'PIX' would cause 'pix@' to be applied to the data within PIX. (Presumably, 'pix@' would cause the components of PIX to be placed on the stack. That is the intent of the design, but in actuality one could invoke ANY word that will expect the base address of the structure on the stack). Remember that the @-like word should consume a DOUBLE structure handle. CAUTION: invoking a large constant structure can have a heavy impact on the stack(s). One should take care that there is enough stack space available (particularly the float stack) before using a constant structure. constant-structure ( '@ ', -- ) This word is used within constant structure defining words. It takes the address of @-like and ,-like words for the data structure and sets up internal pointers to them. It also sets a flag that is used internally within the structure defining word (PIXEL in the above examples) so that a CONSTANT defining data structure is built. =================================================================== Forcing EARLY BINDING By default, all the offsets to the individual data elements are resolved at run-time (i.e. LATE binding). Sometimes the binding COULD be done at compile time (EARLY binding) because all the information to do the binding is available then. If early binding can be done, then the code will run faster. The programmer can inform the compiler that early binding is possible by using the words [[ to start early binding, and ]] to end it. For example, : test [[ pix1 ->red ]] @ . ; The words '[[' and ']]' are intended to be used at compile time, they become harmless no-ops at runtime. The current implementation DOES NOT allow for nesting of early bindings. =================================================================== Limitations to this version of structure: The attribute subfields are globally visible and a collision will occur if two different data structures are designed to have attribute fields with the same name. Relying on the side effect of 'sizeof' to set the value TYPE-ID, in order to build nested structures is a syntactic weakness of this version. I have yet to figure out a way around it that doesn't result in a loss of all the functionality that structures now have. (Through the version iterations, the word 'sizeof' has lost most of its original meaning, perhaps it should be renamed to something more meaningful considering its current function). =================================================================== Glossary [[ ( -- ) This word initiates early binding for data structures. ]] ( -- ) This words causes early binding for data structures to end. -> ( s-id s-addr -- ptr-addr ) Gets the base address of a pointer instance. An example of its usage: buf -> .z{ ->! ( addr ptr-addr -- ) Make the pointer (ptr-addr) refer to the array base address (addr - CELL). addrof ( -- addr ) Returns the base address of the data fields of the INSTANCE of a data structure that immediately follows this word. array: ( id offset n cell_size -- offset' ) This word is used for reserving an array that has 'n' elements of size 'cell_size' within the data structure. The name of this field is the name that immediately follows the word. The word 'array:' is used in same manner as the word 'array' so that the array can have either scalar or structure elements. attribute ( offset size -- offset' ) This word is used for reserving an attribute field of the specified size in a data structure. The name of this field is the name that immediately follows the word. cell: ( offset -- offset' ) This word is used for reserving an one cell attribute field in a data structure. The name of this field is the name that immediately follows the word. cells: ( offset n -- offset' ) This word is used for reserving an 'n' cell attribute field in a data structure. The name of this field is the name that immediately follows the word. char: ( offset -- offset' ) This word is used for reserving an one character attribute field in a data structure. The name of this field is the name that immediately follows the word. chars: ( offset n -- offset' ) This word is used for reserving an 'n' character attribute field in a data structure. The name of this field is the name that immediately follows the word. constant-structure ( '@ ', -- ) This word is used within constant structure defining words. It takes the address of @-like and ,-like words for the data structure and sets up internal pointers to them. It also sets a flag that is used internally within the structure defining word (PIXEL in the above examples) so that a CONSTANT defining data structure is built. double: ( offset -- offset' ) This word is used for reserving an one double cell attribute field in a data structure. The name of this field is the name that immediately follows the word. endstructure ( id offset -- ) This word is used to complete a structure type definition. float: ( offset -- offset' ) This word is used for reserving an one float cell attribute field in a data structure. The name of this field is the name that immediately follows the word. h@ ( hdl1 -- hdl2 ) Fetches the (double) structure handle stored in a STRUCT-HANDLE instance, hdl1. h! ( hdl1 hdl2 -- ) Stores the (double) structure handle hdl1 in the STRUCT-HANDLE instance hdl2. integer: ( offset -- offset' ) This word is used for reserving an one integer cell attribute field in a data structure. The name of this field is the name that immediately follows the word. This word is synonomous with 'cell:'. pointer: ( id offset cell_size -- id offset' ) This word is used for reserving a pointer within the data structure to a data field with 'cell_size' elements. The elements can be either scalar or data structures. The name of this field is the name that immediately follows the word. This word is primarily intended for pointing to dynamically allocated or aliased arrays. sizeof ( -- ncells ) This immediate word expects the name of a structure type (NOT AN INSTANCE) and returns the number of cells the data part of the structure occupies. NOTE: The word 'sizeof' has a side effect of setting the VALUE TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY? to TRUE. These are needed in order to properly construct arrays of structures and nested structures. The result of the 'sizeof' word is used for the first parameter of ARRAY (or third parameter of DARRAY) declarations. struct: ( offset size -- offset' ) This word is used for reserving an attribute field for a nested structure that occupies 'size' cells within the data structure. The name of this field is the name that immediately follows the word. This word is synonomous with 'attribute'. structure ( "name" -- id offset ) This word begins the definition of a structure data type (NOT AN INSTANCE). This is where the name for the type is defined. typeof ( -- id ) Returns the type id of the data type that immediately follows this word. =================================================================== References: Ertl, A., 1994; Usenet comp.lang.forth comments, 28 Oct. Hayes, J.R., 1992; Objects for Small Systems, Embedded Systems Programming, V. 5, No. 3(March) pp. 32 - 45 Hendrix, M, 1994; Personal communications Pountain, D., 1987; Object-Oriented Forth, Implementation of Data Structures, Academic Press, New York, 119 pages, ISBN 0-12-563570-2