EN · DE · RU · FR · ES

#1873: DiffMatchPatch.java

projectforge-common/src/main/java/name/fraser/neil/plaintext/DiffMatchPatch.java Third-party library (embedded) — Google diff-match-patch, projectforge-common/src/main/java/name/fraser/neil/plaintext/DiffMatchPatch.java 2464 lines · 1638 code · 688 comments · 138 blank
Google's diff-match-patch algorithm library by Neil Fraser, embedded directly in ProjectForge under the original package name.fraser.neil.plaintext. Licensed under Apache License 2.0. Provides synchronous text comparison (diff), fuzzy matching (match), and patch application (patch) — the same algorithms that power Google Docs, Etherpad, and Wikipedia's revision comparison. Used by ProjectForge for text change tracking and version comparison.

Architecture

Origin and License

Algorithm Overview

The library implements three related algorithms, all operating on plain text:

1. Diff — Myers' O(ND) Difference Algorithm

Computes the minimal set of edits to transform text1 into text2. The core algorithm is Myers' diff algorithm (1986), which finds the shortest edit script using an O(ND) time/space approach where N is the sum of string lengths and D is the number of differences.

2. Match — Bitap Algorithm (Shift-Or)

Locates the best approximate match for a pattern within a larger text. Uses the Bitap algorithm (also known as shift-or or Baeza-Yates–Gonnet algorithm) which:

3. Patch — Unified Diff Format + Fuzzy Application

Creates patches (text representations of diffs) that can be serialized, transmitted, and applied to other texts:

Key Configuration Parameters

FieldDefaultPurpose
Diff_Timeout1.0 secondsMaximum seconds for a diff computation before returning partial result (0 = infinite)
Diff_EditCost4Cost of an empty edit operation — higher values produce cleaner, less fragmented diffs
Match_Threshold0.5Fuzzy match threshold (0.0 = exact, 1.0 = very loose)
Match_Distance1000Locality bonus — distance in characters from expected match location
Patch_DeleteThreshold0.5How closely delete blocks must match expected content to be accepted
Patch_Margin4Context character count around patch hunks
Match_MaxBits32Bit width for the Bitap algorithm (matches Java int size)

Data Structure: Diff (Inner Class)

Diffs are represented as a LinkedList<Diff> where each Diff is an inner static class with:

Internal Helper: LinesToCharsResult

An optimization for large texts: converts lines to unique Unicode characters, performs the diff on the compressed representation, then reconstructs line-level output. This dramatically reduces the comparison space for line-oriented text (e.g., source code).

Imports

ProjectForge Usage

ProjectForge embeds this library rather than depending on it as an external JAR. This was likely done because:

  1. The library is stable and rarely changes (last significant update was the Java port circa 2010)
  2. Embedding avoids dependency management issues
  3. The package name (name.fraser.neil.plaintext) is distinct and non-conflicting
  4. It may have custom modifications for ProjectForge-specific needs (though the code appears to be the standard port)
The DiffMatchPatch is used for tracking changes to text fields in ProjectForge's data objects — showing what changed between versions of an address, project description, or other text content.

The library has been ported to multiple languages (Python, JavaScript, Java, C++, C#, Lua, Dart). The Java version uses LinkedList for diff storage because diffs are primarily traversed sequentially (not random-access) and the algorithm frequently prepends/appends elements.

Git History

48a93dedb Colored console log. UserGroupCache export for debugging and comparing work now. CollectionUtil improved. KotlinStringExtension.shortenMiddle() added.