Over the years, it seems that every Forth implementation has its own string word set. Until is no different. ANS Forth specifies a string word set, I believe. I haven't seen it and so cannot comment on its usefulness. The best string package I have ever used is the one included in VAX STOIC. It proved more than adequate on a very large database publishing application, which is mostly string manipulation.
Until also implements the C string library. C strings are null terminated rather than counted. Care must be taken to be sure that the two types are not mixed.
Until includes a couple of words that give the programmer some flexibility in how Until treats strings. A word, " defines a string constant. It can be used both interactively and compiled into words. " will return either a null or counted string depending on the value in null_strings. C string functions always return a value. A second string flag, string_return, controls whether the C string return values are placed on the stack or ignored by Until.
The best type of string depends on how the string are used. Null terminated strings have a definite advantage when the start of the string is being manipulated most. Null terminated strings are most efficient when the end of the string is being manipulated most often.
The STOIC string word set is implemented in Until via the source file STRING.APP. They should eventually be rewritten in C for speed. However, they exist and are functional at the present time.
The STOIC string words are:
The structure of strings in STOIC is a max length byte, followed by the current length, then the bytes. This is an extra field beyond Forth strings. The STOIC string primitives test to be sure that strings do not overflow. This safety feature saved me many times. The Until string implementation does not do run-time length checking because of the speed penalty involved. The speed penalty would be especially noticeable in Until's high level implementation of these words.
Note that there are two versions of most of the string words. For example, move_string and .move_string. The normal version moves a counted string to a counted string. The '.' version requires an explicit length argument.
Assume you are building a file name, including path. This example uses many of the STOIC string manipulation words to take the pieces and build up a counted string with the full file name.
\ \ Load string words include string.unt \ Assumes current directory \ Variables 80 string full.name \ final destination string 40 string device 40 string directory 40 string name 40 string extension \ Load up initial values { c:} here 2 device .move_string { trash} here name null->counted { dat} here 3 extension .strap \ Assumes length is 0 { \junk} here directory null->counted \ Now put the pieces together device full.name move_string directory full.name strap { \} here c@ full.name .stab name fill.name strap ascii . full.name stab extension count full.name .strap \ Show the result... full.name msg c:\junk\trash.datRemember that { ... } leaves the string at here in null terminated form. That is the reason .move_string must be used instead of move_string.
The code for string manipulation seems a little strange at first, but I have found it to be the best string manipulation package for Forth after using it for a short while. When dealing with counted strings, I frequently use the STOIC string words.
Until also supports the C string functions. There is a corresponding primitive set up for the most commonly used C string functions. The word arguments correspond to the C function arguments. All C strings are null terminated. Until provides words to convert between null and counted strings.
The C string words are:
The C string functions generally return the address of the destination string on success. C programmer's ignore the return 99% of the time. Forth does not allow ignoring a return value. There are two solutions, one be compatible with the function calls and return the address or don't return the address at all. I chose the long form for completeness, however painful. The word string_returns has been added to set a flag that toggles string word return values on and off.
The following is the word glossary for the C string words.
" This is a string\n"
defines a constant string with a <newline> as its last character.
" 12345" atolleaves 12345 on the stack. Be aware that the C atol() function does not return an error when a non-digit is encountered:
" 123+" atolleaves 123 on the stack. This can cause some difficult to track down program bugs, so be careful!
true null_strings \\ Null terminated strings false null_strings \\ Counted strings
true string_return \\ String words return values false string_return \\ String words do not return valuesThis approach gives the programmer a choice. The words affected by string_returns are:
The default is TRUE.
This section contains examples of using the C string functions in Until.
false string_returns true null_strings 100 string dest 100 string source : .dest ." dest : " dest dup strlen type cr ; : .source ." source: " source dup strlen type cr ; : init.strings dest " this is the dest string..." strcpy source " The source string" strcpy .dest .source cr ; : s.test1 ." Calling strlen..." cr dest strlen .s cr .dest cr ; : s.test2 ." Calling strcat..." cr dest source strcat .s cr .dest .source cr ; : s.test3 ." Calling strcpy..." cr dest source strcpy .s cr .dest .source cr ; : s.test4 ." Calling strncpy..." cr dest source 5 strncpy .s cr .dest .source cr ; : s.test5 ." Calling strncat..." cr dest source 5 strncat .s cr .dest .source cr ; : s.test6 ." Calling strcmp..." cr dest source strcmp .s drop .dest .source cr ." Calling strncmp..." cr dest source 5 strncmp .s drop .dest .source cr ;Execute init.strings, then each individual test to see the results.