Search
Lexer (logic to split input stream into tokens) is working
Currently, lexer is done.

The lexer is implemented as a single stateless function.
Input: IEnumerable<char>
Output: length of a token, or zero for 'continue' or negative for invalid input.


The aim is to keep it simple, minimalistic design and reap as many benefits from that as possible.

The lexer, unlike conventional approach, returns basically strings (actually positions in input stream, but they are supposed to be used as strings).

The reasoning behind that is, tokens are either special sequences/keywords (strings comparable by reference) or several easily distinguishable var-length chunks: identifiers, numbers, string literals. Thus, it makes whole sense to not abstract tokens but do a little extra 'recognition' work later again to tell one from another when they are used. Basically, we really care about performance of keywords, special tokens (operators, brackets etc.) and identifiers. But not literals. Literals need additional custom parsing anyway, it doesn't make much difference if we do simple lightweight parsing of literals at lexer stage and repeat it with complete parsing when we need the actual value behind a literal.

Anyway, the initial part is working (still may have bugs),there's a complication though: preprocessor. Tried to do it as a separate thing, but it looks like it needed to be in Lexer really.
Last edited May 19 2009 at 10:41 PM by mihailik, version 6
Updating...
© 2006-2012 Microsoft | Get Help | Privacy Statement | Terms of Use | Code of Conduct | Advertise With Us | Version 2012.1.11.18365