Thea
|
A simple style tokenizer for reading text files. More...
#include <TextInputStream.hpp>
Classes | |
class | BadMSVCSpecial |
Thrown while parsing a number of the form 1. More... | |
struct | Settings |
Tokenizer configuration options. More... | |
class | TokenException |
Thrown when a token cannot be read. More... | |
class | WrongString |
String read from input did not match expected string. More... | |
class | WrongSymbol |
Thrown by the read methods if a symbol string does not match the expected string. More... | |
class | WrongTokenType |
Thrown by the read methods if a token is not of the expected type. More... | |
Public Types |
Public Member Functions | |
char const * | getName () const |
Get the name of the object. More... | |
std::string | getPath () const |
Get the path to the file from which this input is drawn, or the first few characters of the string if created from a string. More... | |
bool | hasMore () |
Returns true while there are tokens remaining. More... | |
Token | peek () |
Get a copy of the next token in the input stream, but don't remove it from the input stream. More... | |
int | peekCharacterNumber () |
Get the character number (relative to the line) for the next token in the input stream. More... | |
int | peekLineNumber () |
Get the line number for the next token. More... | |
void | push (Token const &t) |
Take a previously read token and push it back at the front of the input stream. More... | |
Token | read () |
Read the next token (which will be the Type::END token if !hasMore()). More... | |
bool | readBoolean () |
Read a boolean. More... | |
std::string | readComment () |
Like readCommentToken(), but returns the token's string. More... | |
void | readComment (std::string const &s) |
Read a specific comment token. More... | |
Token | readCommentToken () |
Read a comment token and return it. More... | |
std::string | readLine () |
Read from the beginning of the next token until the following newline and return the result as a string, ignoring all parsing in between. More... | |
std::string | readNewline () |
Like readNewlineToken(), but returns the token's string. More... | |
void | readNewline (std::string const &s) |
Read a specific newline token. More... | |
Token | readNewlineToken () |
Read a newline token and return it. More... | |
double | readNumber () |
Read one token (or possibly two) as a number. More... | |
Token | readSignificant () |
Calls read() until the result is not a newline or comment. More... | |
std::string | readString () |
Like readStringToken, but returns the token's string. More... | |
void | readString (std::string const &s) |
Read a specific string token. More... | |
Token | readStringToken () |
Read a string token and return it. More... | |
std::string | readSymbol () |
Like readSymbolToken(), but returns the token's string. More... | |
void | readSymbol (std::string const &symbol) |
Read a specific symbol token. More... | |
void | readSymbols (std::string const &s1, std::string const &s2) |
Read a series of two specific symbols. More... | |
void | readSymbols (std::string const &s1, std::string const &s2, std::string const &s3) |
Read a series of three specific symbols. More... | |
void | readSymbols (std::string const &s1, std::string const &s2, std::string const &s3, std::string const &s4) |
Read a series of four specific symbols. More... | |
Token | readSymbolToken () |
Read a symbol token and return it. More... | |
int8 | setName (char const *s) |
Set the name of the object from a C-style string. More... | |
virtual int8 | setName (std::string const &s) |
Set the name of the object from a std::string . More... | |
TextInputStream (std::string const &path_, Settings const &settings=Settings::defaults()) | |
Open a file for reading formatted text input. More... | |
TextInputStream (FS fs, std::string const &str, Settings const &settings=Settings::defaults()) | |
Creates input directly from a string. More... | |
Static Public Member Functions | |
static bool | parseBoolean (std::string const &_string) |
Extract a boolean value from a string. More... | |
static double | parseNumber (std::string const &_string) |
Extract a number from a string. More... | |
Protected Member Functions | |
std::string const & | getNameStr () const |
Access the name string directly, for efficiency. More... | |
A simple style tokenizer for reading text files.
TextInputStream handles a superset of C++, Java, Matlab, and Bash code text including single line comments, block comments, quoted strings with escape sequences, and operators. TextInputStream recognizes several categories of tokens, which are separated by white space, quotation marks, or the end of a recognized operator:
The special ".." and "..." tokens are always recognized in addition to normal C++ operators. Additional tokens can be made available by changing the Settings.
Negative numbers are handled specially because of the ambiguity between unary minus and negative numbers – see the note for TextInputStream::read().
Inside quoted strings escape sequences are converted. Thus the string token for ["a\\nb"] is 'a', followed by a newline, followed by 'b'. Outside of quoted strings, escape sequences are not converted, so the token sequence for [a\nb] is symbol 'a', symbol '\', symbol 'nb' (this matches what a C++ parser would do). The exception is that a specified TextInputStream::Settings::otherCommentCharacter preceeded by a backslash is assumed to be an escaped comment character and is returned as a symbol token instead of being parsed as a comment (this is what a LaTeX or VRML parser would do).
Assumes that the file is not modified once opened.
Derived from the G3D library: http://g3d.sourceforge.net
Examples
TextInputStream ti(TextInputStream::FROM_STRING, "name = 'Max', height = 6");
Token t;
t = ti.read(); assert(t.type == Token::Type::SYMBOL); assert(t.sval == "name");
t = ti.read(); assert(t.type == Token::Type::SYMBOL); assert(t.sval == "=");
std::string name = ti.read().sval; ti.read();
TextInputStream ti(TextInputStream::FROM_STRING, "name = 'Max', height = 6"); ti.readSymbols("name", "="); std::string name = ti.readString(); ti.readSymbols(",", "height", "="); double height = ti.readNumber();
Definition at line 273 of file TextInputStream.hpp.
enum FS |
A flag indicting the source of a stream.
Definition at line 590 of file TextInputStream.hpp.
|
explicit |
Open a file for reading formatted text input.
Definition at line 1308 of file TextInputStream.cpp.
TextInputStream | ( | FS | fs, |
std::string const & | str, | ||
Settings const & | settings = Settings::defaults() |
||
) |
Creates input directly from a string.
The first argument must be TextInputStream::FROM_STRING.
Definition at line 1323 of file TextInputStream.cpp.
|
virtualinherited |
|
protectedinherited |
Access the name string directly, for efficiency.
Definition at line 98 of file NamedObject.hpp.
std::string getPath | ( | ) | const |
Get the path to the file from which this input is drawn, or the first few characters of the string if created from a string.
Definition at line 604 of file TextInputStream.hpp.
bool hasMore | ( | ) |
Returns true while there are tokens remaining.
Definition at line 250 of file TextInputStream.cpp.
|
static |
Extract a boolean value from a string.
Definition at line 71 of file TextInputStream.cpp.
|
static |
Extract a number from a string.
Includes MSVC specials parsing
Definition at line 77 of file TextInputStream.cpp.
Token peek | ( | ) |
Get a copy of the next token in the input stream, but don't remove it from the input stream.
Definition at line 138 of file TextInputStream.cpp.
int peekCharacterNumber | ( | ) |
Get the character number (relative to the line) for the next token in the input stream.
Definition at line 156 of file TextInputStream.cpp.
int peekLineNumber | ( | ) |
Get the line number for the next token.
Definition at line 150 of file TextInputStream.cpp.
void push | ( | Token const & | t | ) |
Take a previously read token and push it back at the front of the input stream.
Can be used in the case where more than one token of read-ahead is needed (i.e. when peek doesn't suffice).
Definition at line 244 of file TextInputStream.cpp.
Token read | ( | ) |
Read the next token (which will be the Type::END token if !hasMore()).
Signed numbers can be handled in one of two modes. If the option TextInputStream::Settings::signedNumbers is true, a '+' or '-' immediately before a number is prepended onto that number and if there is intervening whitespace, it is read as a separate symbol. If TextInputStream::Settings::signedNumbers is false, read() does not distinguish between a plus or minus symbol next to a number and a positive/negative number itself. For example, "x - 1" and "x -1" will be parsed the same way by read(). In both cases, readNumber() will contract a leading "-" or "+" onto a number.
Definition at line 162 of file TextInputStream.cpp.
bool readBoolean | ( | ) |
Read a boolean.
If the next input token is not a boolean, throws WrongTokenType.
Definition at line 1103 of file TextInputStream.cpp.
std::string readComment | ( | ) |
Like readCommentToken(), but returns the token's string.
Use this method (rather than readCommentToken) if you want the token's value but don't really care about its location in the input. Use of readCommentToken is encouraged for better error reporting.
Definition at line 1216 of file TextInputStream.cpp.
void readComment | ( | std::string const & | s | ) |
Read a specific comment token.
If the next token in the input is a comment matching s, it will be consumed. Use this method if you want to match a specific comment from the input. In that case, typically error reporting related to the token is only going to occur because of a mismatch, so no location information is needed by the caller.
WrongTokenType will be thrown if the next token in the input stream is not a comment. WrongString will be thrown if the next token in the input stream is a comment but does not match the s parameter. When an exception is thrown, no tokens are consumed.
Definition at line 1222 of file TextInputStream.cpp.
Token readCommentToken | ( | ) |
Read a comment token and return it.
Use this method (rather than readComment) if you want the token's location as well as its value.
WrongTokenType will be thrown if the next token in the input stream is not a comment. When an exception is thrown, no tokens are consumed.
Definition at line 1201 of file TextInputStream.cpp.
std::string readLine | ( | ) |
Read from the beginning of the next token until the following newline and return the result as a string, ignoring all parsing in between.
The newline is not returned in the string, and the following token read will be a newline or end of file token (if they are enabled for parsing).
Definition at line 177 of file TextInputStream.cpp.
std::string readNewline | ( | ) |
Like readNewlineToken(), but returns the token's string.
Use this method (rather than readNewlineToken) if you want the token's value but don't really care about its location in the input. Use of readNewlineToken() is encouraged for better error reporting.
Definition at line 1252 of file TextInputStream.cpp.
void readNewline | ( | std::string const & | s | ) |
Read a specific newline token.
If the next token in the input is a newline matching s, it will be consumed. Use this method if you want to match a specific newline from the input. In that case, typically error reporting related to the token is only going to occur because of a mismatch, so no location information is needed by the caller.
WrongTokenType will be thrown if the next token in the input stream is not a newline. WrongString will be thrown if the next token in the input stream is a newline but does not match the s parameter. When an exception is thrown, no tokens are consumed.
Definition at line 1258 of file TextInputStream.cpp.
Token readNewlineToken | ( | ) |
Read a newline token and return it.
Use this method (rather than readNewline) if you want the token's location as well as its value. WrongTokenType will be thrown if the next token in the input stream is not a newline. When an exception is thrown, no tokens are consumed.
Definition at line 1237 of file TextInputStream.cpp.
double readNumber | ( | ) |
Read one token (or possibly two) as a number.
If the first token in the input is a number, it is returned directly. If TextInputStream::Settings::signedNumbers is false and the input stream contains a '+' or '-' symbol token immediately followed by a number token, both tokens will be consumed and a single token will be returned by this method.
WrongTokenType will be thrown if one of the input conditions described above is not satisfied. When an exception is thrown, no tokens are consumed.
Definition at line 1121 of file TextInputStream.cpp.
Token readSignificant | ( | ) |
Calls read() until the result is not a newline or comment.
Definition at line 44 of file TextInputStream.cpp.
std::string readString | ( | ) |
Like readStringToken, but returns the token's string.
Use this method (rather than readStringToken) if you want the token's value but don't really care about its location in the input. Use of readStringToken is encouraged for better error reporting.
Definition at line 1180 of file TextInputStream.cpp.
void readString | ( | std::string const & | s | ) |
Read a specific string token.
If the next token in the input is a string matching s, it will be consumed. Use this method if you want to match a specific string from the input. In that case, typically error reporting related to the token is only going to occur because of a mismatch, so no location information is needed by the caller.
WrongTokenType will be thrown if the next token in the input stream is not a string. WrongString will be thrown if the next token in the input stream is a string but does not match the s parameter. When an exception is thrown, no tokens are consumed.
Definition at line 1186 of file TextInputStream.cpp.
Token readStringToken | ( | ) |
Read a string token and return it.
Use this method (rather than readString) if you want the token's location as well as its value.
WrongTokenType will be thrown if the next token in the input stream is not a string. When an exception is thrown, no tokens are consumed.
Definition at line 1165 of file TextInputStream.cpp.
std::string readSymbol | ( | ) |
Like readSymbolToken(), but returns the token's string.
Use this method (rather than readSymbolToken) if you want the token's value but don't really care about its location in the input. Use of readSymbolToken() is encouraged for better error reporting.
Definition at line 1288 of file TextInputStream.cpp.
void readSymbol | ( | std::string const & | symbol | ) |
Read a specific symbol token.
If the next token in the input is a symbol matching symbol, it will be consumed. Use this method if you want to match a specific symbol from the input. In that case, typically error reporting related to the token is only going to occur because of a mismatch, so no location information is needed by the caller.
WrongTokenType will be thrown if the next token in the input stream is not a symbol. WrongSymbol will be thrown if the next token in the input stream is a symbol but does not match the symbol parameter. When an exception is thrown, no tokens are consumed.
Definition at line 1294 of file TextInputStream.cpp.
void readSymbols | ( | std::string const & | s1, |
std::string const & | s2 | ||
) |
Read a series of two specific symbols.
See readSymbol().
Definition at line 756 of file TextInputStream.hpp.
void readSymbols | ( | std::string const & | s1, |
std::string const & | s2, | ||
std::string const & | s3 | ||
) |
Read a series of three specific symbols.
See readSymbol().
Definition at line 763 of file TextInputStream.hpp.
void readSymbols | ( | std::string const & | s1, |
std::string const & | s2, | ||
std::string const & | s3, | ||
std::string const & | s4 | ||
) |
Read a series of four specific symbols.
See readSymbol().
Definition at line 774 of file TextInputStream.hpp.
Token readSymbolToken | ( | ) |
Read a symbol token and return it.
Use this method (rather than readSymbol) if you want the token's location as well as its value.
WrongTokenType will be thrown if the next token in the input stream is not a symbol. When an exception is thrown, no tokens are consumed.
Definition at line 1273 of file TextInputStream.cpp.
|
virtualinherited |
Set the name of the object from a C-style string.
Implements INamedObject.
Definition at line 86 of file NamedObject.hpp.
|
virtualinherited |
Set the name of the object from a std::string
.
Definition at line 94 of file NamedObject.hpp.