Getting started with OpenNLP 1.5.0 – Sentence Detection and Tokenizing

OpenNLP is a poorly-documented pain in the ass to figure out.  There are various scattered resources you can find on the internet, none of which are particularly thorough, accurate, or up to date.

The most useful that I’ve found is a blog post called Getting started with OpenNLP (Natural Language Processing), but it is over 4 years old and refers to version 1.4.3 (1.5.x is what I’ll discuss here).  That post is quite helpful, but still required digging into the source code to figure out the beast that is coreference resolution.

Here’s to hoping that I can add a few posts to the conversation and help both myself and, perhaps, others…

  • How to use the OpenNLP 1.5.0 Parser
  • Making Coreference Resolution your bitch with OpenNLP 1.5.0
  • Most (if not all) of the more advanced OpenNLP components rely on text that is broken into sentences and/or tokens, so I’m starting with those…


    Iterating over a PriorityQueue is NOT ordered

    I stumbled across a little Java gotcha recently when trying to order a collection of Java objects.  For some reason I decided to use a PriorityQueue because I wanted to maintain duplicates, so TreeSet and TreeMap were ruled out.

    Unfortunately, the PriorityQueue is only ordered when remove/peek/poll-ing objects from the collection. NOT when iterating over the collection. This fun fact is clearly identified in the PriorityQueue JavaDocs:

    The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order.

    Unfortunately the JavaDocs aren’t very obvious even if you do happen to specifically access the iterator (and really isn’t obvious from a for-loop perspective).

    While it makes perfect sense in retrospect (due to the internals of the underlying heap), it was only by chance that I caught my mistake and made the fix:

    final PriorityQueue<MyObject> ordered = ...;
    while (ordered.size() > 0) {
       final MyObject next = ordered.remove();

    Android GPS Emulator

    I was looking for a better way to set the emulator’s GPS coordinates than using geo fix and manually determining the specific latitude and longitude coordinates.

    Unable to find anything, I put together a little program that uses GWT and the Google Maps API to launch a browser-based map tool to set the GPS location in the emulator: