High order data structures V1.9


	This explains the workings and rationale of a scheme for high
        order data structures.  These are the kind of data structures that
        are roughly equivalent to C/C++ structs or Pascal/Ada Records,
        FORTRAN has no such construct (Fortran 90 does have such a facility
        with its TYPE construct).

	This version of data structures is based upon ideas described
        in Hayes (1992), Pountain (1987), Ertl (1994), and Hendrix (1994)
        and upon much experimentation with several earlier versions of
        this code.


        These data structures allow the creation of several kinds of data
        that are naturally associated with each other, but do not
        necessarily work well together as a single array of data.


	This implementation of data structures was designed with the
	following goals in mind:


                1. Portability -- the implementation must be ANS Forth.

		2. Generality -- the data structures need to be used in
                                 many kinds of applications.

		3. Readability -- the syntax to be created has to be easy
                                 to read and relatively intuitive.

	        4. Easy to use -- an awkward design will not likely get used.

		5. Works with multiple instances of a given data structure.

		6. Allows nested data structures.

		7. Allows arrays of data structures (both statically and
		   dynamically declared) without adding additional syntax
                   to existing array words.


        The intent of the design is to provide a general, easy to use
        scheme for handling data only.  The temptation to add object
        oriented properties to the structures was avoided (a possible
        future data structure could be designed for objects).
	The structure contains ATTRIBUTE subfields, that is data or
        pointers to data.
        
        When using a data structure the first step is to declare the
        type, e.g,

        structure pixel
            integer: ->red
            integer: ->green
            integer: ->blue
            float:   ->intensity
        endstructure
        
        defines a structure named pixel that has 3 INTEGER attribute fields,
        (named ->red, ->green and ->blue) plus a floating point field
        (named ->intensity).
        An alternative form of the type declaration uses the word ATTRIBUTE
        with the existing type size words,

        structure pixel
            integer attribute  ->red
            integer attribute  ->green
            integer attribute  ->blue
            real*4  attribute  ->intensity
        endstructure

        The words, ->red, ->green, ->blue and ->intensity will be created
        by these declarations.  They will automatically know their offsets
        into the data structure.  Internally the space for the attributes
        occurs in the order that they are declared.  Note that depending
        upon the data structure being declared, alignment words (ALIGNED,
        FALIGNED, etc.) may be necessary between the specification of the
        attributes.

        
	An instance of a pixel is declared the same way as declaring
        a variable,

        pixel pix1

        declares a pixel instance called pix1.

	The key to this simplicity is the fact that 'structure' is a second
        order defining word that tells the compiler how a new type is to be
        defined.


	Structures can be nested, for example:

	structure point
		integer: .x
                integer: .y
                sizeof pixel struct: .px
	endstructure


	point p

	(see below for explanation of 'sizeof' ).


	Then the following can be used,

	p .x             \ the address of the .x attribute of p
	p .px ->red      \ the address of the ->red attribute of p's .px


        ===================================================================


        USING A STRUCTURE


	When a structure name is invoked alone, it stacks a double structure
	handle.  This handle consists of two components each occupying one
	cell: the base address of the structure instance on the top of the
	stack, and the type-id for this structure below it on the stack.


        The address of an attribute field of a given structure instance
        is obtained by naming the object and then the attribute, e.g.

        pix1 ->red

        gets the address of the ->red attribute of pix1.  The attribute
        is late bound to the structure so that the following kinds
        of operations will work,


        : clear_red ( addr -- )
                  0 SWAP ->red !
        ;

        pix1 clear_red


        The structures and attribute fields use the type id's that are part
        of the structure handle to assure that a member attribute goes with
	a proper structure.  So, for example,

	pix1 .x

	will give a RUNTIME failure (this happens at runtime not compile
        time beause the attribute fields use late binding).


	Because the structure handles contain the type information, a
        structure member can be dereferenced well after the structure is
        referred to.
	For example,


	p       \ the point structure above

       ( .....a bunch of stuff with no net stack effect
                that involves, other kinds of structures ... )

	.x      \ the user WANTS p .x at this point


	This will type of code will work OK.


        There is a STRUCT-HANDLE type defined that is useful for storing
        away structure contexts,

        structure STRUCT-HANDLE
             1 CELLS attribute .type
             1 CELLS attribute .addr
        endstructure


        With this one can store away a structure reference for later use,
        e.g.,

        STRUCT-HANDLE SELF

        pix1   SELF .addr !   SELF .type !


        In order to reduce the possiblity of errors when doing this, the
        words h@ and h! are provided to move values in and out of this
        structure,

        pix1    SELF h!

        ...

        SELF h@


        h@ ( hdl1 -- hdl2 )

        Fetches the (double) structure handle stored in a STRUCT-HANDLE
        instance, hdl1.


        h! ( hdl1 hdl2 -- )

        Stores the (double) structure handle hdl1 in the STRUCT-HANDLE
        instance hdl2.

        ===================================================================


	CREATING ARRAYS OF STRUCTURES.


	Arrays of structures can use the existing ARRAY/DARRAY mechanism.
	To do this the new word 'sizeof' is needed,

        
	sizeof (  -- ncells )
        
        This immediate word expects the name of a structure type (NOT AN
        INSTANCE) and returns the number of cells the data part of the
        structure occupies.  For example,

        sizeof pixel

	and NOT,

        sizeof pix1


        NOTE: The word 'sizeof' has a side effect of setting the VALUE
        TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY?
        to TRUE.  These are needed in order to properly construct arrays of
        structures and nested  structures.  The result of the 'sizeof' word
        is used for the first parameter of ARRAY (or third parameter of
        DARRAY) declarations,
        
	5  sizeof pixel ARRAY p{


        or,

	sizeof pixel DARRAY q{


	& q{ 5 }malloc


        These create 5 element arrays where each element has the same type
        as the structure pixel, the names of the arrays are p{ and q{.  The
        address of a particular element is obtained in the usual way, e.g.,

	p{ 2 }            \ stacks the address of element 2


        p{ 2 } ->red      \ stacks the address of the ->red attribute of
                          \ element 2


        ===================================================================


	Defining structures that contain arrays

        To define a structure that contain a STATIC array (i.e. fixed
        in size a compile time), use the word ARRAY: similar to how
        ARRAY would be used outside of a structure,

        structure databuffer
                  integer: .index
                  20 integer array: .z{
        endstructure

        With the above defined, every databuffer instance will have an
        integer index and a 20 element integer array.   The word ARRAY:
        can be applied to structures, just as the word ARRAY can, to
        create structures that contain arrays of other structures.

        To create structures with DYNAMICALLY allocated arrays, one must
        use the word POINTER: to create a structure pointer and then use
        some support words to associate the pointer with the desired array.
        (The word POINTER: is actually general purpose, it can be used for
        any type of pointer).

        To declare a structure that contains pointers,

         \ a data structure for LU factored matrices
          structure LUMATRIX
                    REAL*4 pointer: .MATRIX{{
                    INTEGER pointer: .PIVOT{
                    INTEGER attribute .N
          endstructure

        Then one needs to create a proper matrix and pivot
        (either statically or dynamically) and then have the LUMATRIX
        instance point to them,

        INTEGER DARRAY itmp{
        REAL*4  DMATRIX mtmp{{

        LUMATRIX a             \ defines an LU Matrix structure instance

        ...

        & itmp{ 5 }malloc                 \ allocate data space
        & mtmp{{ 5 5 }}malloc

        itmp{ a -> .pivot{ ->!     \ a's pivot now points to itmp{
        mtmp{{ a -> .matrix{{ ->!   \     and to mtmp{{


        Once this is done itmp{ and mtmp{{ can be used for other purposes
        as long as the originally allocated space is not affected.

        To free the allocated space, one needs to reverse the process,

        a .pivot{ & itmp{ &!         \ itmp{ now points to the pivot
        & itmp{ }free                \ releases the space


        The words used for this are,


        ->  ( s-id s-addr -- ptr-addr )

        Gets the base address of a pointer instance.  An example of its
        usage:   buf -> .z{


        ->!  ( addr ptr-addr -- )

        Make the pointer (ptr-addr) refer to the array base address
        (addr - CELL).


        IMPORTANT NOTE: If a{ is an array, and B is an instance of
        a structure that has the pointer attribute .a{, then the
        sequence,

        a{ B -> .a{ ->!

        has a ZERO net stack effect if a{ is an array of simple scalar
        types.  BUT has the effect of leaving the TYPE-ID of a{ on the
        stack if a{ is an array of structures.

        The same applies to the reverse sequence that aliases an
        external array to a pointer in the structure,

        B .a{  & a{ &!


        ===================================================================
                

        Defining CONSTANT structures

        To define a constant version of a data structure one needs to
        have already defined the word that fetches data from the data
        structure.  One has to also define the word that does the
        equivalent of ',' for that data strucuture (i.e. it allocates
        space for and initializes a given data structure at HERE).
        Then the word 'constant-structure' is used in the constant
        defining word before the pre-existing defining word is invoked,

        : PIX-CONSTANT
              ['] pix@
              ['] pix,
              constant-structure pixel
        ;
        

        If 'pix,' had the stack diagram ( red green blue -- )
        ( F: inten -- ) then a constant pixel could be declared as,

        20 30 40 0.75e0 pix-constant PIX
        

        Then subsequent invocations of 'PIX' would cause 'pix@' to be
        applied to the data within PIX.  (Presumably, 'pix@' would cause
        the components of PIX to be placed on the stack. That is the intent
        of the design, but in actuality one could invoke ANY word that
        will expect the base address of the structure on the stack).
        Remember that the @-like word should consume a DOUBLE structure
        handle.

        CAUTION: invoking a large constant structure can have a heavy
	impact on the stack(s).  One should take care that there is enough
	stack space available (particularly the float stack) before using
	a constant structure.


        constant-structure ( '@ ', -- )

        This word is used within constant structure defining words.  It
        takes the address of  @-like and ,-like words for the data
        structure and sets up internal pointers to them. It also sets
        a flag that is used internally within the structure defining word
        (PIXEL in the above examples) so that a CONSTANT defining data
        structure is built.
        

        ===================================================================


	Forcing EARLY BINDING


        By default, all the offsets to the individual data elements are
	resolved at run-time (i.e. LATE binding).  Sometimes the binding
	COULD be done at compile time (EARLY binding) because all the
	information to do the binding is available then.  If early binding
	can be done, then the code will run faster.  The programmer can
	inform the compiler that early binding is possible by using the
	words [[ to start early binding, and ]] to end it.

        For example,

	 : test    [[ pix1 ->red ]] @ . ;


	The words '[[' and ']]' are intended to be used at compile time,
	they become harmless no-ops at runtime.

	The current implementation DOES NOT allow for nesting of early
        bindings.


        ===================================================================


        Limitations to this version of structure:


        The attribute subfields are globally visible and
        a collision will occur if two different data structures
        are designed to have attribute fields with the same name.


	Relying on the side effect of 'sizeof' to set the value TYPE-ID,
	in order to build nested structures is a syntactic weakness of this
	version.  I have yet to figure out a way around it that doesn't
	result in a loss of all the functionality that structures now have.
        (Through the version iterations, the word 'sizeof' has lost most of
        its original meaning, perhaps it should be renamed to something more
        meaningful considering its current function).


        ===================================================================


                                 Glossary


	[[ ( -- )

	This word initiates early binding for data structures.

	]] ( -- )

	This words causes early binding for data structures to end.


        ->  ( s-id s-addr -- ptr-addr )

        Gets the base address of a pointer instance.  An example of its
        usage:   buf -> .z{


        ->!  ( addr ptr-addr -- )

        Make the pointer (ptr-addr) refer to the array base address
        (addr - CELL).


	addrof ( -- addr )

	Returns the base address of the data fields of the INSTANCE of a
	data structure that immediately follows this word.


	array: ( id offset n cell_size -- offset' )

	This word is used for reserving an array that has 'n' elements of
	size 'cell_size' within the data structure. The name of this field
	is the name that immediately follows the word.   The word 'array:'
	is used in same manner as the word 'array' so that the array can
	have either scalar or structure elements.
	

        attribute ( offset size -- offset' )

        This word is used for reserving an attribute field of the specified
	size in a data structure.  The name of this field is the name that
	immediately follows the word.


	cell: ( offset -- offset' )

	This word is used for reserving an one cell attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


	cells: ( offset n -- offset' )

	This word is used for reserving an 'n' cell attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


	char: ( offset -- offset' )

	This word is used for reserving an one character attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


	chars: ( offset n -- offset' )

	This word is used for reserving an 'n' character attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


        constant-structure ( '@ ', -- )

        This word is used within constant structure defining words.  It
        takes the address of  @-like and ,-like words for the data
        structure and sets up internal pointers to them. It also sets
        a flag that is used internally within the structure defining word
        (PIXEL in the above examples) so that a CONSTANT defining data
        structure is built.
        

	double: ( offset -- offset' )

	This word is used for reserving an one double cell attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


        endstructure ( id offset -- )

	This word is used to complete a structure type definition.


	float: ( offset -- offset' )

	This word is used for reserving an one float cell attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.


        h@ ( hdl1 -- hdl2 )

        Fetches the (double) structure handle stored in a STRUCT-HANDLE
        instance, hdl1.


        h! ( hdl1 hdl2 -- )

        Stores the (double) structure handle hdl1 in the STRUCT-HANDLE
        instance hdl2.


	integer: ( offset -- offset' )

	This word is used for reserving an one integer cell attribute field
        in a data structure.  The name of this field is the name that
        immediately follows the word.  This word is synonomous with 'cell:'.


	pointer: ( id offset cell_size -- id offset' )

	This word is used for reserving a pointer within the data structure
	to a data field with 'cell_size' elements. The elements can be
	either scalar or data structures.  The name of this field is the
	name that immediately follows the word.   This word is primarily
	intended for pointing to dynamically allocated or aliased arrays.


	sizeof (  -- ncells )
        
        This immediate word expects the name of a structure type (NOT AN
        INSTANCE) and returns the number of cells the data part of the
        structure occupies.

        NOTE: The word 'sizeof' has a side effect of setting the VALUE
        TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY?
        to TRUE.  These are needed in order to properly construct arrays of
        structures and nested  structures.  The result of the 'sizeof' word
        is used for the first parameter of ARRAY (or third parameter of
	DARRAY) declarations.


	struct: ( offset size -- offset' )

	This word is used for reserving an attribute field for a nested
	structure that occupies 'size' cells within the data structure.
        The name of this field is the name that immediately follows the word.
        This word is synonomous with 'attribute'.


        structure ( "name" -- id offset )

        This word begins the definition of a structure data type (NOT AN
        INSTANCE).  This is where the name for the type is defined.


	typeof ( -- id )

        Returns the type id of the data type that immediately follows this
	word.

	
        ===================================================================

	References:
        
Ertl, A., 1994; Usenet comp.lang.forth comments, 28 Oct.

Hayes, J.R., 1992; Objects for Small Systems, Embedded Systems Programming,
	V. 5, No. 3(March) pp. 32 - 45

Hendrix, M, 1994; Personal communications

Pountain, D., 1987; Object-Oriented Forth, Implementation of Data
	Structures, Academic Press, New York, 119 pages,  ISBN 0-12-563570-2