First thing’s first–what is coreference resolution?
Co-reference means that multiple expressions in a sentence or document refer to the same thing. OpenNLP contains a “linker” that analyzes the tokens of a sentences to identify which chunks of text refer to the same things (e.g., people, organizations, events).
Take, for example, the sentence “John drove to Judy’s house. He made her dinner.” In this example both John and He refer to the same entity (John); and Judy and her refer to the same, different entity (Judy). Don’t expect OpenNLP to get this 100% correct. Even a simple example like this is a difficult problem.
Picking up where I left off once upon a time (and finally wrapping up this series), here are links to the old material:
- Getting started with OpenNLP – Sentence Detection and Tokenizing
- Part-of-Speech (POS) Tagging with OpenNLP 1.5.0