oss-sec mailing list archives

CVE-2026-40682: Apache OpenNLP: XXE via Dictionary Parsing in DictionaryEntryPersistor


From: Richard Zowalla <rzo1 () apache org>
Date: Fri, 01 May 2026 17:54:46 +0000

Severity: moderate 

Affected versions:

- Apache OpenNLP (org.apache.opennlp:opennlp-tools) before 2.5.9
- Apache OpenNLP (org.apache.opennlp:opennlp-tools) 3.0 before 3.0.0-M3

Description:

XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor


Versions Affected: before 2.5.9, before 3.0.0-M3


Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without 
enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the 
only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain 
fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) 
containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or 
server-side request forgery via http:// entity references during SAX parsing, before the application processes a single 
dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets 
FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The 
public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading 
user-supplied dictionaries, making untrusted input a realistic scenario.


Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade 
immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the 
Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before 
it reaches the parser.

Credit:

Subramanian S (finder)

References:

https://opennlp.apache.org/
https://www.cve.org/CVERecord?id=CVE-2026-40682


Current thread: