Making Coreference Resolution your bitch with OpenNLP 1.5.0

First thing’s first–what is coreference resolution?

Co-reference means that multiple expressions in a sentence or document refer to the same thing. OpenNLP contains a “linker” that analyzes the tokens of a sentences to identify which chunks of text refer to the same things (e.g., people, organizations, events).

Take, for example, the sentence “John drove to Judy’s house. He made her dinner.” In this example both John and He refer to the same entity (John); and Judy and her refer to the same, different entity (Judy). Don’t expect OpenNLP to get this 100% correct. Even a simple example like this is a difficult problem.

Picking up where I left off once upon a time (and finally wrapping up this series), here are links to the old material:

  • How to use the OpenNLP 1.5.0 Parser
  • Making Coreference Resolution your bitch with OpenNLP 1.5.0 (you’re reading it!)
  • (more…)

    How to use the OpenNLP 1.5.0 Parser

    After a brief (*cough*cough*) delay, I’m back to figure out how in the world to use this Open NLP Parser. First, a quick refresher:

  • How to use the OpenNLP 1.5.0 Parser (surprise, you’re reading it)
  • Making Coreference Resolution your bitch with OpenNLP 1.5.0
  • Getting Started

    I’m only going to warn you once: this is a long post. Go grab a beer or a glass of wine or some coffee before starting. It’s long. Now I’ve warned you twice.

    (more…)

    Part-of-Speech (POS) Tagging with OpenNLP 1.5.0

    Continuing from where I left off, I’m going to quickly touch on part-of-speech tagging before moving on. It’s actually pretty straightforward once you’re set up to run OpenNLP. This all assumes that you’ve already done sentence detection and tokenization. If you haven’t, go back to the beginning. Here are the links to the rest of my posts:

  • How to use the OpenNLP 1.5.0 Parser
  • Making Coreference Resolution your bitch with OpenNLP 1.5.0
  • Getting Started

    model files

    Only one additional model file is needed for part-of-speech tagging.

    (more…)

    Getting started with OpenNLP 1.5.0 – Sentence Detection and Tokenizing

    OpenNLP is a poorly-documented pain in the ass to figure out.  There are various scattered resources you can find on the internet, none of which are particularly thorough, accurate, or up to date.

    The most useful that I’ve found is a blog post called Getting started with OpenNLP (Natural Language Processing), but it is over 4 years old and refers to version 1.4.3 (1.5.x is what I’ll discuss here).  That post is quite helpful, but still required digging into the source code to figure out the beast that is coreference resolution.

    Here’s to hoping that I can add a few posts to the conversation and help both myself and, perhaps, others…

  • How to use the OpenNLP 1.5.0 Parser
  • Making Coreference Resolution your bitch with OpenNLP 1.5.0
  • Most (if not all) of the more advanced OpenNLP components rely on text that is broken into sentences and/or tokens, so I’m starting with those…

    (more…)