KeyValuePairParser.javaA hand-written stream parser that reads key=value pairs from a java.io.Reader, one character at a time. No regex, no third-party parsing library — just pure character-by-character stateful parsing with a 5-element pushback buffer. This is the kind of code that existed before libraries like Apache Commons CSV became standard — or written because the author wanted exact control over error messages and edge cases without dependency overhead.
The parser supports: quoted values (with "), escaped quotes inside quoted strings ("" → a literal "), configurable separator char (default from KeyValuePairWriter.DEFAULT_SEPARATOR_CHAR), line/column tracking for error messages, and whitespace skipping. Input format: key1=value1,key2="quoted value, with comma",key3=123.
301 lines. The parser is structured as a classical two-phase lexer:
read() / nextToken() / unread()): reads one character from the Reader, tracks line/column numbers, handles \r\n (Windows) and \n (Unix) newlines, and classifies the token as EOF, EOL, or CHAR. The nextToken() method is the tokenizerparseKey() / parseValue()): consumes tokens to build key and value strings. parse() is the entry point that calls parseKey+parseValue in a loop until EOF/EOLThis is the same architecture as a compiler front-end — tokenizer + parser — just for key=value data instead of programming languages.
A 5-element int[] pushbackBuffer with index pushbackIndex. When the parser reads one character too far (e.g. in skipWhitespaces()), it pushes the character back via unread(int b). The buffer updates line/column counters when pushing back newlines — critical for accurate error messages. 5 elements is enough for any lookahead needed by this parser (maximum lookahead is 1-2 characters). This is the standard implementation of 1-character lookahead — the simplest form of backtracking.
parseValue() (lines 141-189) implements a full CSV-style quoted field parser:
", enters quoted mode (quoted=true)"" → escaped quote (appends one " to the value)" followed by separator or whitespace+separator → end of value" followed by anything else → error (UNEXPECTED_CHARACTER_AFTER_QUOTATION_MARK)" → error (UNEXPECTED_QUOTATIONMARK — quotes only allowed in quoted mode)QUOTATIONMARK_MISSED_AT_END_OF_CELL)This follows the RFC 4180 CSV specification for quoted fields, but simplified — no multi-line quoted fields, no header detection, no type inference.
Every error message includes line number and column number — e.g. "Unexpected quotation mark... Error in line: 3 (14)". This is critical for debugging config files. The createMessage() static method formats errors with the pattern: msg + " Error in line: " + line + " (" + col + ")" + (s != null ? ": " + s : ""). The lineno and colno counters are maintained by read() (increments line on \n) and nextToken() (increments column on each char).
This parser is used to read configuration files and script parameter inputs — places where users or admins write key=value data in simple text files. Example file parsed by this:
name=ProjectForge version="8.0.0" description="ERP system with timesheet, calendar, and fibu" maxUsers=100
The parser reads line by line (EOL stops each parse() call), producing a Map<String, String>. The companion class KeyValuePairWriter does the reverse — writes maps to key=value text format.