#733

Purpose

Compresses and decompresses strings using GZIP + Base64 encoding. Solves two problems: (1) some database fields have limited character length (e.g. VARCHAR(4000)), and (2) storing large JSON blobs in database columns wastes space. This utility compresses the string with GZIP, then Base64-encodes the binary result so it can be stored as a plain string in a database VARCHAR or TEXT column. The decompression reverses the process: Base64 decode → GZIP decompress → original string.

Why Base64 after GZIP? GZIP produces binary bytes — not safe for storage in text-based database columns (which use character encodings like UTF-8 that may interpret certain byte values as control characters). Base64 converts binary data into a pure ASCII string (using 64 safe characters: A-Z, a-z, 0-9, +, /) — guaranteed safe for any text storage.

API — two methods

`compress(String str) → String`

Takes any string (null-safe — returns null for null/empty input). Wraps user preference XML, JSON configuration blobs, and large text fields for database storage. Algorithm:

str.getBytes() — converts to platform-default charset bytes. Warning: uses the JVM's default charset (typically UTF-8 on modern systems but could be different on legacy setups). If the original string contains non-ASCII characters, different JVMs could produce different compressed results
Wraps in GZIPOutputStream — the Java standard library GZIP implementation (RFC 1952)
Closes the GZIP stream — this flushes the compressed data to the underlying ByteArrayOutputStream
Encodes the compressed bytes to Base64 string via Base64Helper.encodeObject() — returns the Base64 string

Error handling: On IOException, logs the error at ERROR level and returns null. The caller must handle null — most callers treat null the same as empty string.

`uncompress(String base64ByteArray) → String`

Reverses compress(). Takes a Base64-encoded compressed string, returns the original. Algorithm:

Base64Helper.decodeObject(base64ByteArray) — decodes Base64 to raw bytes (cast from Object to byte[] — note the unchecked cast. The decodeObject method returns Object for historical XStream compatibility)
Wraps in GZIPInputStream — decompresses the GZIP data
Uses IOUtils.copy(gzip, out) (Apache Commons IO) to stream decompressed data into a ByteArrayOutputStream
out.toString() — converts bytes back to string using the platform default charset (same charset assumption as compress)

Error handling: Catches both IOException and ClassNotFoundException (the latter from Base64Helper.decodeObject() which uses Java object deserialization under the hood — a potential security issue if malicious Base64 data is passed). Returns null on any error.

Usage in ProjectForge

Used by two key components:

User preferences (UserXmlPreferencesService): User preference XML can become quite large (especially calendar view configurations with multiple calendars, custom colors, and saved filters). Before storing in the database, the XML is compressed via GZIPHelper.compress() and stored as a Base64 string in the UserPrefDO entity. On load, uncompress() restores the original XML
Audit trail / event log: The external_access_logs column on t_plugin_datatransfer_area (see #105) stores a JSON-serialized list of access events. With frequent external access, this string could exceed the VARCHAR(10000) limit — GZIP compression reduces its size significantly

Notably, this is not used for the license key BLOB storage (LicenseDO) — binary files use direct JDBC BLOB columns, not text compression.

Key design decisions and gotchas

Platform-default charset: Both methods use str.getBytes() and out.toString() without specifying a charset. This means the compression round-trip is only guaranteed if compress and uncompress run on the same JVM. If compressed data is moved to a different server with a different default charset, the decompressed string could be corrupted. In practice, all modern JVMs default to UTF-8 (since Java 18), so this is rarely an issue in production
Base64 overhead: GZIP typically compresses text to 20-30% of original size, but Base64 encoding adds ~33% overhead (3 binary bytes → 4 Base64 characters). Net result: compressed+Base64 is typically 25-40% of original size. This means it's useful for strings over ~2KB where the compression dominates the Base64 overhead
Null handling: Both methods return null for null input — callers don't need to pre-check. But callers do need to handle null returns (when compression/decompression fails) — treating null as empty string is typical
No thread safety concerns: All I/O is through local ByteArrayInputStream/OutputStream objects — no shared state, fully thread-safe
ClassNotFoundException in uncompress: This is a code smell — Base64Helper.decodeObject() uses Java's ObjectInputStream, which can deserialize arbitrary Java objects (a known security vulnerability). If an attacker can inject Base64 data that gets passed to uncompress(), they could potentially execute arbitrary code. In practice, the data only comes from the database and is generated by compress(), so the attack surface is limited to database compromise scenarios

#733: GZIPHelper.java

Purpose

API — two methods

compress(String str) → String

uncompress(String base64ByteArray) → String

Usage in ProjectForge

Key design decisions and gotchas

#733: `GZIPHelper.java`

`compress(String str) → String`

`uncompress(String base64ByteArray) → String`