Strings

Over the years, it seems that every Forth implementation has its own string word set. Until is no different. ANS Forth specifies a string word set, I believe. I haven't seen it and so cannot comment on its usefulness. The best string package I have ever used is the one included in VAX STOIC. It proved more than adequate on a very large database publishing application, which is mostly string manipulation.

Until also implements the C string library. C strings are null terminated rather than counted. Care must be taken to be sure that the two types are not mixed.

Until includes a couple of words that give the programmer some flexibility in how Until treats strings. A word, " defines a string constant. It can be used both interactively and compiled into words. " will return either a null or counted string depending on the value in null_strings. C string functions always return a value. A second string flag, string_return, controls whether the C string return values are placed on the stack or ignored by Until.

The best type of string depends on how the string are used. Null terminated strings have a definite advantage when the start of the string is being manipulated most. Null terminated strings are most efficient when the end of the string is being manipulated most often.


STOIC Strings

The STOIC string word set is implemented in Until via the source file STRING.APP. They should eventually be rewritten in C for speed. However, they exist and are functional at the present time.

The STOIC string words are:

move_string
Move one string to another.
.move_string
Move the contents of a string to another string.
stab
Store a byte.
strap
String append.
.strap
String append
.streq
Test for string equality.
etc.

The structure of strings in STOIC is a max length byte, followed by the current length, then the bytes. This is an extra field beyond Forth strings. The STOIC string primitives test to be sure that strings do not overflow. This safety feature saved me many times. The Until string implementation does not do run-time length checking because of the speed penalty involved. The speed penalty would be especially noticeable in Until's high level implementation of these words.

Note that there are two versions of most of the string words. For example, move_string and .move_string. The normal version moves a counted string to a counted string. The '.' version requires an explicit length argument.


move_string
('source 'destination --- )
"move string". Move the contents of source to destination. Both addresses are counted strings. The destination string must be large enough to hold the source string.
.move_string
(addr len 'destination --- )
"dot move string". This is a variation of move_string that uses an address and length to specify the source string. destination specifies the address of the counted string that receives the source. No provision is made for overlapping strings.
stab
(char 'destination --- )
"stab". Store a byte, or stab, appends a byte to the end of the counted string whose address is specified by destination. The count byte is automatically incremented.
strap
('source 'destination --- )
"strap". The string append word appends the contents of the source string, 'source, to the destination string, 'destination. Both source and destination string must be a counted string.
.strap
.strap High level (addr len 'destination --- )
"dot strap". append the string at addr to destination for len bytes. This word is a variation of strap.

STOIC String Example

Assume you are building a file name, including path. This example uses many of the STOIC string manipulation words to take the pieces and build up a counted string with the full file name.

\
\ Load string words
include string.unt        \ Assumes current directory

\ Variables 
80 string full.name       \ final destination string
40 string device
40 string directory
40 string name
40 string extension

\ Load up initial values
{ c:} here 2 device .move_string
{ trash} here name null->counted
{ dat} here 3 extension .strap      \ Assumes length is 0
{ \junk} here directory null->counted

\ Now put the pieces together
device full.name move_string
directory full.name strap
{ \} here c@ full.name .stab
name fill.name strap
ascii . full.name stab
extension count full.name .strap

\ Show the result...
full.name msg
c:\junk\trash.dat

Remember that { ... } leaves the string at here in null terminated form. That is the reason .move_string must be used instead of move_string.

The code for string manipulation seems a little strange at first, but I have found it to be the best string manipulation package for Forth after using it for a short while. When dealing with counted strings, I frequently use the STOIC string words.


C Strings

Until also supports the C string functions. There is a corresponding primitive set up for the most commonly used C string functions. The word arguments correspond to the C function arguments. All C strings are null terminated. Until provides words to convert between null and counted strings.

The C string words are:

atol
Ascii to Long.
strcat
Concatenate the source string to the destination string.
strncat
Concatenate n bytes of the source string to the destination string.
strcmp
Compare two strings.
strncmp
Compare two strings for n bytes.
strcpy
Copy one string to another.
strncpy
Copy n bytes of one string to another.
toupper
Convert a character to upper case.
tolower
Convert a character to lower case.

The C string functions generally return the address of the destination string on success. C programmer's ignore the return 99% of the time. Forth does not allow ignoring a return value. There are two solutions, one be compatible with the function calls and return the address or don't return the address at all. I chose the long form for completeness, however painful. The word string_returns has been added to set a flag that toggles string word return values on and off.

The following is the word glossary for the C string words.


"
( --- 'constant-string )
"double-quote". Generate a constant string bound by a ". Character quoting using the '\' character ala C is allowed. For example:
    " This is a string\n"

defines a constant string with a <newline> as its last character.


atol
( 'string --- n)
"a-to-l". The Ascii to long function converts a string to a long integer. Characters up to the first non-digit are converted and left on the stack.
        " 12345" atol
leaves 12345 on the stack. Be aware that the C atol() function does not return an error when a non-digit is encountered:
        " 123+" atol
leaves 123 on the stack. This can cause some difficult to track down program bugs, so be careful!
null_strings
( tf --- )
"null strings". Until treats strings as counted by default for ". This word allows the programmer to toggle the operation of " to switch to null terminated representation.
      true  null_strings    \\ Null terminated strings
      false null_strings    \\ Counted strings

strcat
( 'dest 'source --- 'dest ) - When string_returns is true.
( 'dest 'source --- ) - When string_returns is false.
"s-t-r cat". Concatenate the source string to the destination string.
strcmp
( 'string1 'string2 --- <1|0|>1 )
"s-t-r c-m-p". Compare two strings. Zero means the strings match. If the return value is less than zero, string1 is less than string2. If the return value is positive, string1 is greater than string2.
strcpy
( 'dest 'source --- 'dest ) - When string_returns is true.
( 'dest 'source --- ) - When string_returns is false.
"s-t-r c-p-y". Copy the string whose address is source into the string at dest. No check is made for string overflow.
string_returns
( tf --- )
"string return". C string functions have the nasty habit of returning values that most C programmers ignore. This causes code that is a mess in Until to explicitly drop unused function values. This word controls whether the C string words return values to the parameter stack.
      true  string_return   \\ String words return values
      false string_return   \\ String words do not return values
This approach gives the programmer a choice. The words affected by string_returns are:

The default is TRUE.


strncat
( 'dest 'source len --- 'dest ) - When string_return is true.
( 'dest 'source len --- ) - When string_return is false.
"s-t-r-n-c-a-t". Concatenate n bytes of the source string to the destination string.
strncmp
( 'string1 'string2 len --- )
"s-t-r-n-c-m-p". Compare two strings for n bytes.
strncpy
( 'dest 'source len --- 'dest ) - When string_return is true.
( 'dest 'source len --- ) - When string_return is false.
"s-t-r-n-c-p-y". Copy n bytes of one string to another.
strret
( --- returnvalue )
"s-t-r-r-e-t". Get the value returned by the last C string word call.
strupr
( 'string --- )
"s-t-r-u-p-r". Convert the null terminated string to all upper case characters. The string is converted in place.
tolower
( CH --- ch )
"to-lower". Convert a character to lower case.
toupper
( ch --- CH )
"toupper". Convert a character to upper case.

C String Example

This section contains examples of using the C string functions in Until.

false string_returns
true null_strings

100 string dest
100 string source

: .dest
   ." dest  : " dest dup strlen type cr
   ;
: .source
   ." source: " source dup strlen type cr
   ;
: init.strings
   dest   " this is the dest string..." strcpy
   source " The source string" strcpy
   .dest   .source cr
   ;
: s.test1
   ." Calling strlen..." cr
   dest strlen .s cr
   .dest cr
   ;
: s.test2
   ." Calling strcat..." cr
   dest source strcat .s cr
   .dest   .source cr
   ;
: s.test3
   ." Calling strcpy..." cr
   dest source strcpy .s cr
   .dest   .source cr
   ;
: s.test4  
   ." Calling strncpy..." cr
   dest source 5 strncpy .s cr
   .dest   .source cr
   ;
: s.test5
   ." Calling strncat..." cr
   dest source 5 strncat .s cr
   .dest   .source cr
   ;
: s.test6
   ." Calling strcmp..." cr
   dest source strcmp .s drop
   .dest   .source cr

   ." Calling strncmp..." cr
   dest source 5 strncmp .s drop
   .dest   .source cr 
   ;

Execute init.strings, then each individual test to see the results.

Table of Contents
Next Section