value = ReadItem(buffer, size, csource)
D0 D1 D2 D3
LONG ReadItem(STRPTR, LONG, struct CSource *);
ReadItem() reads text from either the current
Input() source or the csource buffer parameter. The text read is considered an "item", or, in more modern terms, a "token". The input may contain more than one token and these tokens need to be separated by whitespace characters (' ' and '\t'). A token can take two different forms. It can either be a sequence of non-whitespace characters, or it can be text enclosed in double quote characters.
The following are valid tokens:
Dir
Lab
45678
abc"
abc"def"ghi
>>
?
=
*
"contains blank spaces"
"uses *"double quotes*""
The following are invalid tokens:
The following text is equivalent and will be processed as two consecutive tokens; ReadItem() will first return a quoted token "foo bar" and the next use of ReadItem() will return an unquoted token "baz":
"foo bar"baz
"foo bar" baz
If both tokens are quoted, the following text is equivalent, too, i.e. two consecutive quoted tokens will be returned as "foo bar" and "baz":
"foo bar""baz"
"foo bar" "baz"
The text read will be processed and stored as a NUL-terminated string referenced by the buffer parameter. For quoted text, the text will be stored without the enclosing double quote characters.
ReadItem() will stop reading further data if there is no more data, e.g. reading from
Input() indicates that the end of the file or stream has been reached, or a ';' (semicolon), '\0' (NUL), or '\n' (line feed) character has been read.
If a token is enclosed in double quote characters, it can contain whitespace characters as well as ';' and double quote characters. In order to use double quote characters in a token enclosed in double quotes, the '*' (asterisk) character must be prepended to the double quote character, like so:
The '*' acts as an escape character and either removes the special meaning of the character which follows it, or replaces it with a control character:
*" Becomes "
** Becomes *
*e Becomes '\x1B' (esc character)
*E Becomes '\x1B' (esc character)
*n Becomes '\n' (line feed character)
*N Becomes '\n' (line feed character)
ReadItem() may unread the last character read before it returns, depending on the circumstances. For example, if ReadItem() reads a '\n' (line feed) or ';' (semicolon) character, it will unread this character before it returns. The client can then check which character made ReadItem() stop and return.
ReadItem() is not generally useful outside its primary field of use, which is the shell (CLI) and a handful of shell commands such as "
Execute", "Skip" or "Lab". ReadItem() is not a simpler alternative to
ReadArgs(). Use
ReadArgs() if you can, which requires more setup effort, but it spares you from having to deal with the handful of conditions under which ReadItem() will unread the last character read.
Initializing the CSource data is straightforward. You need to fill in a pointer to the string to be processed, the length of the string and set the current string read position to 0, like so:
struct CSource cs;
cs.CS_Buffer = "one two three\n";
cs.CS_Length = strlen(cs.CS_Buffer);
cs.CS_CurChr = 0;
If you use the CSource data parameter, you should check after each ReadItem() call if the buffer has been completely read. This will be the case if cs.CS_CurChr >= cs.CS_Length, i.e. subsequent calls to ReadItem() will return no more tokens.
If you let ReadItem() make use of
Input(), it will read one character at a time through
FGetC(
Input()) until it reaches the end of the file or stream.
When it finds a character that it makes it stop, ReadItem() will unread that character. For example, a '\n' character indicates that the entire text for this line has been read. This "stop character" will then be unread, for you to look at it, or for the next ReadItem() call to make use of it. If any of the following result values are returned by ReadItem(), the stop character may have been unread:
- ITEM_NOTHING
-
see next
- ITEM_UNQUOTED
-
The text read consists of whitespace characters, followed by a ';' (semicolon character), or consists of a single ';' character.
- ITEM_ERROR
-
ReadItem() could read no further data (reached the end of the CSource buffer, or the end of the file/stream), or after having read a '*' in text introduced by a double quote character, failed to read further data.
- Caution:
-
The ReadItem() behaviour with regard to unreading the last character read is inconsistent.
If ReadItem() returned ITEM_NOTHING, you should try to read the next character because it might just be a '\n' (line feed) which the next call to ReadItem() will read, unread and then return ITEM_NOTHING again, leading to an endless loop. If you found a '\n', discard it. If you found a different character, unread it and call ReadItem() again, if necessary. For a file/stream, unreading the last character involves calling
UnGetC(file, -1).
If you make use of the CSource data parameter, ReadItem() will "unread" the last character read by decrementing the CSource.CS_CurChr value.
It is, generally, a good idea to stop calling ReadItem() if it returned ITEM_ERROR. Before you can call ReadItem() again, proceed by reading every single remaining character until you find a '\n' (line feed) character, discarding the remainder of the line. Stop if you reach the end of the buffer or file/stream.
Compare the return value of the ReadItem() function against ITEM_ERROR first, rather than assuming that a negative value indicates an error condition. ITEM_EQUAL is a negative value, too, but does not indicate an error.
Keep in mind that ReadItem() is foremost a helper function which is used by the shell to read input entered by the user, as well as instructions from a script file. This is why it stops reading from a line when it finds a ';' (semicolon) character, which for the shell indicates that the remainder of the line contains a comment which should be ignored.
ReadItem() expects its input to be in a certain format, which so happens to match the general syntax of the AmigaDOS shell and its batch script files. It is not a general token processing function. It actually constitutes the fundamental behaviour of the shell.
The line terminator is the '\n' (line feed) character. A '\r' (carriage return) character preceding the line feed character will be retained as is. If you expect to process lines terminated by "\r\n", you may want to remove the '\r' before you submit the line to the ReadItem() function.
If the caller is a Task and either the csource parameter is NULL, or the csource parameter CS_Buffer member is NULL, then ReadItem() will call
FGetC(
Input()) and may subsequently crash.