Over the years, it seems that every Forth implementation has its own string word set. Until is no different. ANS Forth specifies a string word set, I believe. I haven't seen it and so cannot comment on its usefulness. The best string package I have ever used is the one included in VAX STOIC. It proved more than adequate on a very large database publishing application, which is mostly string manipulation.
Until also implements the C string library. C strings are null terminated rather than counted. Care must be taken to be sure that the two types are not mixed.
Until includes a couple of words that give the programmer some flexibility in how Until treats strings. A word, " defines a string constant. It can be used both interactively and compiled into words. " will return either a null or counted string depending on the value in null_strings. C string functions always return a value. A second string flag, string_return, controls whether the C string return values are placed on the stack or ignored by Until.
The best type of string depends on how the string are used. Null terminated strings have a definite advantage when the start of the string is being manipulated most. Null terminated strings are most efficient when the end of the string is being manipulated most often.
The STOIC string word set is implemented in Until via the source file STRING.APP. They should eventually be rewritten in C for speed. However, they exist and are functional at the present time.
The STOIC string words are:
The structure of strings in STOIC is a max length byte, followed by the current length, then the bytes. This is an extra field beyond Forth strings. The STOIC string primitives test to be sure that strings do not overflow. This safety feature saved me many times. The Until string implementation does not do run-time length checking because of the speed penalty involved. The speed penalty would be especially noticeable in Until's high level implementation of these words.
Note that there are two versions of most of the string words. For example, move_string and .move_string. The normal version moves a counted string to a counted string. The '.' version requires an explicit length argument.
Assume you are building a file name, including path. This example uses many of the STOIC string manipulation words to take the pieces and build up a counted string with the full file name.
\
\ Load string words
include string.unt \ Assumes current directory
\ Variables
80 string full.name \ final destination string
40 string device
40 string directory
40 string name
40 string extension
\ Load up initial values
{ c:} here 2 device .move_string
{ trash} here name null->counted
{ dat} here 3 extension .strap \ Assumes length is 0
{ \junk} here directory null->counted
\ Now put the pieces together
device full.name move_string
directory full.name strap
{ \} here c@ full.name .stab
name fill.name strap
ascii . full.name stab
extension count full.name .strap
\ Show the result...
full.name msg
c:\junk\trash.dat
Remember that { ... } leaves the string at here in
null terminated form. That is the reason .move_string must
be used instead of move_string.
The code for string manipulation seems a little strange at first, but I have found it to be the best string manipulation package for Forth after using it for a short while. When dealing with counted strings, I frequently use the STOIC string words.
Until also supports the C string functions. There is a corresponding primitive set up for the most commonly used C string functions. The word arguments correspond to the C function arguments. All C strings are null terminated. Until provides words to convert between null and counted strings.
The C string words are:
The C string functions generally return the address of the destination string on success. C programmer's ignore the return 99% of the time. Forth does not allow ignoring a return value. There are two solutions, one be compatible with the function calls and return the address or don't return the address at all. I chose the long form for completeness, however painful. The word string_returns has been added to set a flag that toggles string word return values on and off.
The following is the word glossary for the C string words.
" This is a string\n"
defines a constant string with a <newline> as its last character.
" 12345" atol
leaves 12345 on the stack. Be aware that the C atol() function does
not return an error when a non-digit is encountered:
" 123+" atol
leaves 123 on the stack. This can cause some difficult to track down
program bugs, so be careful!
true null_strings \\ Null terminated strings
false null_strings \\ Counted strings
true string_return \\ String words return values
false string_return \\ String words do not return values
This approach gives the programmer a choice. The words affected
by string_returns are:
The default is TRUE.
This section contains examples of using the C string functions in Until.
false string_returns true null_strings 100 string dest 100 string source : .dest ." dest : " dest dup strlen type cr ; : .source ." source: " source dup strlen type cr ; : init.strings dest " this is the dest string..." strcpy source " The source string" strcpy .dest .source cr ; : s.test1 ." Calling strlen..." cr dest strlen .s cr .dest cr ; : s.test2 ." Calling strcat..." cr dest source strcat .s cr .dest .source cr ; : s.test3 ." Calling strcpy..." cr dest source strcpy .s cr .dest .source cr ; : s.test4 ." Calling strncpy..." cr dest source 5 strncpy .s cr .dest .source cr ; : s.test5 ." Calling strncat..." cr dest source 5 strncat .s cr .dest .source cr ; : s.test6 ." Calling strcmp..." cr dest source strcmp .s drop .dest .source cr ." Calling strncmp..." cr dest source 5 strncmp .s drop .dest .source cr ;Execute init.strings, then each individual test to see the results.