#737: KeyValuePairParser.java

projectforge-business/src/main/java/org/projectforge/framework/utils/KeyValuePairParser.java Lines: 301 · Author: Kai Reinhard · Type: Java hand-written lexer/parser — zero dependencies 301 lines · 229 code · 34 comments · 38 blank

Purpose

A hand-written stream parser that reads key=value pairs from a java.io.Reader, one character at a time. No regex, no third-party parsing library — just pure character-by-character stateful parsing with a 5-element pushback buffer. This is the kind of code that existed before libraries like Apache Commons CSV became standard — or written because the author wanted exact control over error messages and edge cases without dependency overhead.

The parser supports: quoted values (with "), escaped quotes inside quoted strings ("" → a literal "), configurable separator char (default from KeyValuePairWriter.DEFAULT_SEPARATOR_CHAR), line/column tracking for error messages, and whitespace skipping. Input format: key1=value1,key2="quoted value, with comma",key3=123.

Architecture: a miniature lexer

301 lines. The parser is structured as a classical two-phase lexer:

  1. Character reader (read() / nextToken() / unread()): reads one character from the Reader, tracks line/column numbers, handles \r\n (Windows) and \n (Unix) newlines, and classifies the token as EOF, EOL, or CHAR. The nextToken() method is the tokenizer
  2. Parser (parseKey() / parseValue()): consumes tokens to build key and value strings. parse() is the entry point that calls parseKey+parseValue in a loop until EOF/EOL

This is the same architecture as a compiler front-end — tokenizer + parser — just for key=value data instead of programming languages.

Pushback buffer

A 5-element int[] pushbackBuffer with index pushbackIndex. When the parser reads one character too far (e.g. in skipWhitespaces()), it pushes the character back via unread(int b). The buffer updates line/column counters when pushing back newlines — critical for accurate error messages. 5 elements is enough for any lookahead needed by this parser (maximum lookahead is 1-2 characters). This is the standard implementation of 1-character lookahead — the simplest form of backtracking.

Quoted value handling — the most complex logic

parseValue() (lines 141-189) implements a full CSV-style quoted field parser:

  1. Skips whitespace, reads first character
  2. If the first char is ", enters quoted mode (quoted=true)
  3. In quoted mode:
    • "" → escaped quote (appends one " to the value)
    • " followed by separator or whitespace+separator → end of value
    • " followed by anything else → error (UNEXPECTED_CHARACTER_AFTER_QUOTATION_MARK)
    • Any other char → appended to value buffer
  4. In unquoted mode:
    • Separator char → end of value
    • " → error (UNEXPECTED_QUOTATIONMARK — quotes only allowed in quoted mode)
    • Any other char → appended
  5. If EOF/EOL reached while quoted → error (QUOTATIONMARK_MISSED_AT_END_OF_CELL)

This follows the RFC 4180 CSV specification for quoted fields, but simplified — no multi-line quoted fields, no header detection, no type inference.

Error handling — precise location info

Every error message includes line number and column number — e.g. "Unexpected quotation mark... Error in line: 3 (14)". This is critical for debugging config files. The createMessage() static method formats errors with the pattern: msg + " Error in line: " + line + " (" + col + ")" + (s != null ? ": " + s : ""). The lineno and colno counters are maintained by read() (increments line on \n) and nextToken() (increments column on each char).

Usage context

This parser is used to read configuration files and script parameter inputs — places where users or admins write key=value data in simple text files. Example file parsed by this:

name=ProjectForge
version="8.0.0"
description="ERP system with timesheet, calendar, and fibu"
maxUsers=100

The parser reads line by line (EOL stops each parse() call), producing a Map<String, String>. The companion class KeyValuePairWriter does the reverse — writes maps to key=value text format.