CSVParser.javajava.io.Reader and tokenizes CSV data one character at a time, supporting quoted fields, embedded newlines within quoted cells, escaped double-quotes ("" convention), configurable field separators, header column name mapping, and UTF-8 BOM detection. Written by Kai Reinhard and H. Spiewok (2005), this predates and avoids external CSV library dependencies.java.io.IOException / java.io.Reader — Character stream inputjava.util.ArrayList, HashMap, List, Map — Result storage and header column indexingorg.apache.commons.lang3.StringUtils — String blank-check in error messagesorg.slf4j.Logger — Error loggingRather than using regex or a parser generator, CSVParser implements a character-by-character lexer with a pushback buffer. This design prioritizes control over error handling and performance for the specific subset of CSV formatting used by ProjectForge.
enum Type { EOF, EOL, CHAR }
Token types: End-Of-File, End-Of-Line, or character data. This drives the parser state machine.
int[] (pushbackBuffer) with index tracking — enables lookahead and backtracking without Reader support for mark()/reset()\ncval for character tokens. Handles \r\n (Windows CRLF) as a single EOL tokenskipBOM() is called during construction to detect and skip a UTF-8 Byte Order Mark (\uFEFF) at the start of the file. If no BOM is present, the first character is pushed back (unread). This enables correct parsing of CSV files exported from Microsoft Excel, which includes a BOM for UTF-8 files.
The core CSV parsing logic handles these cases:
| Case | Behavior |
|---|---|
| Unquoted cell | Characters are accumulated until separator or EOL |
Quoted cell ("...") | Characters inside quotes accumulated; quotes must be properly closed |
Escaped quote ("") | Two consecutive double-quotes inside a quoted cell represent one literal quote character |
| Embedded newline | Newlines within quoted cells are preserved (multiline cell values) |
| Trailing whitespace | Whitespace after closing quote is skipped; expects separator or EOL next |
| Unterminated quote | Throws RuntimeException with descriptive error message including line/column number |
Reads cells until EOL or EOF, collecting them into a List<String>. Returns null at EOF (not an empty list — callers can distinguish end-of-file from empty lines).
For CSV files with a header row, parseHeadCols() reads the first line and builds a colMap: Map<String, Integer> mapping column names to their positional index. Subsequent getCell(List<String>, colname) calls retrieve values by column name rather than position. This enables Excel-like named column access.
Three distinct error constants provide specific diagnostics:
ERROR_UNEXPECTED_QUOTATIONMARK = "Unexpected quotation mark \" (only allowed in quoted cells)." ERROR_QUOTATIONMARK_MISSED_AT_END_OF_CELL = "Quotation \" missed at the end of cell." ERROR_DELIMITER_OR_NEW_LINE_EXPECTED_AFTER_QUOTATION_MARK = "Delimiter or new line expected after quotation mark." ERROR_UNEXPECTED_CHARACTER_AFTER_QUOTATION_MARK = "Unexpected character after quotation mark."
Each message is augmented with line and column numbers via createMessage().
CSVParser uses CSVWriter.DEFAULT_CSV_SEPARATOR_CHAR (';' — semicolon) as its default separator. This is the European CSV convention (Microsoft Excel in German locales uses semicolon-delimited CSV). The separator is configurable via setCsvSeparatorChar().
parseLine() reads one line and returns all cells — suitable for moderate-sized files but not for very large (multi-GB) CSVs868d6abb7 2025 -> 2026 161d71602 WIP: CSVParser: BOM chars. dfb2378df WIP: CSVParser: multilines etc. 63081666f Source file headers: 2024-> 2025. a73905c14 Fix typos in projectforge*/ directories Found via codespell a72903e36 *.java, *.kt: StringBuffer -> StringBuilder. b6092df09 Copyright 2023 -> 2024 ab45d51fa Copyright 2001-2022 -> 2001-2023.