PdfParserProject | RecentChanges | Preferences

A class which, when instantiated with a stream, returns tokens through calls to a next() method. When no more tokens appear in the stream, a call to next() will return nil. While each token is a String object, there is also a token type accessible by client classes. The token type can be #TYPE_NUMBER #TYPE_WORD or #TYPE_SYMBOL. Symbols are stored as an instance variable within the class and define special characters which should be returned as individual tokens. For example, '[' and ']' are stored in the symbol table by PdfParser so that arrays can be identified.

I noticed last night (11/19/00) that I was making an assumption in this class that delimiters would always separate individual tokens, so for example, I was presuming that an embedded array would look like this:

 [ [1 2 3] ]
What I didn't plan on was something where delimiters did not exist between tokens. Example:
 [[1 2 3]]
This is a simple fix in the stream tokenizer I'll file out tonight & create test case for. 11/20/00 -- Ivan
This class now supports missing-leading-and-trailing-white-space-delimiter formats now by 'peeking' ahead at the next element of the stream to see whether it is in the symtable. If so, it does treat it as a sort of delimiter, but does not advance the stream's position. Without this, the PdfStreamTokenizer would advance all the way to the next whitespace delimiter and try to create a Number from "3]]", effectively stripping off the delimiters.

 [[1 2 3]]

Because PdfStrings? and PdfArrays? have this in common, both print without terminating whitespace delimiters. This would alter the appearance of the PdfString in the actual document. -- Patty
PdfParserProject | RecentChanges | Preferences
This page is read-only (last edited December 14, 2000 9:38 pm (diff))