Part-of-Speech (POS) Tagging with OpenNLP 1.5.0

Continuing from where I left off, I’m going to quickly touch on part-of-speech tagging before moving on. It’s actually pretty straightforward once you’re set up to run OpenNLP. This all assumes that you’ve already done sentence detection and tokenization. If you haven’t, go back to the beginning. Here are the links to the rest of my posts:

Getting Started

model files

Only one additional model file is needed for part-of-speech tagging.

  • en-pos-maxent.bin

As with all of the model files, it can be found at http://opennlp.sourceforge.net/models-1.5/ and are identified by language and component. I’m using the Maxent model (maximum entropy) instead of the Perceptron model because, frankly, I’m not familiar with what the Perceptron model is. If you care, you can read about OpenNLP Maxent here.

I use maven, so these files go into src/main/resources and are loaded with getResourceAsStream, as you’ll see below.

Part-of-Speech Tagging

The description from the OpenNLP SourceForge wiki:

The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token can have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to guess the correct pos tag out of the tag set. To limit the possible tags for a token a tag dictionary can be used which increases the tagging and runtime performance of the tagger.

Code time:

Tokenizer _tokenizer = null;

InputStream modelIn = null;
try {
   // Loading tokenizer model
   modelIn = getClass().getResourceAsStream("/en-pos-maxent.bin");
   final POSModel posModel = new POSModel(modelIn);
   modelIn.close();

   _posTagger = new POSTaggerME(posModel);

} catch (final IOException ioe) {
   ioe.printStackTrace();
} finally {
   if (modelIn != null) {
      try {
         modelIn.close();
      } catch (final IOException e) {} // oh well!
   }
}

And then to use it:

_posTagger.tag(tokens);

Although the tokenizer only returns an array of string tokens, the tag method is overloaded to accept a list of strings, an array of strings, or a single string.

Example

Here are the expected results for an example taken from the above wiki page (and corrected).

Tokens:

  1. [Pierre] [Vinken] [,] [61] [years] [old] [,] [will] [join] [the] [board] [as] [a] [nonexecutive] [director] [Nov.] [29] [.]
  2. [Mr.] [Vinken] [is] [chairman] [of] [Elsevier] [N.V.] [,] [the] [Dutch] [publishing] [group] [.]

Part-of-Speech Tags:

  1. [NNP] [NNP] [,] [CD] [NNS] [JJ] [,] [MD] [VB] [DT] [NN] [IN] [DT] [JJ] [NN] [NNP] [CD] [.]
  2. [NNP] [NNP] [VBZ] [NN] [IN] [NNP] [NNP] [,] [DT] [JJ] [NN] [NN] [.]
Update: These are Penn English Treebank POS tags

Next Step: How to use the OpenNLP 1.5.0 Parser

My source code and test cases can be found at https://github.com/dpdearing/nlp


Comments

  1. Your posts are awesome. I am working a project where I need to use a NLP library, your posts made my life easier.
    thanks man

    Reply
  2. Can you post a legend what the pos tags stand for? At the moment I’m working on a project where I use this and I dont know at the moment how much tags there are and what e.g. “JJ”, “IN” and the rest of them means. This would be very helpful.

    Also thanks a lot for your tutorials :-)!

    Reply
  3. Thanks a lot for your tutorial, discovering openNLP is easier thanks to you

    Reply
  4. I’m new in openNLP and I’m trying to use your code just to test the POS Tagger and I have tried your code exactly but the following statement keep coming up. Can you help me on this? your help will be greatly appreciated, thank you.

    Exception in thread “AWT-EventQueue-0” java.lang.IllegalArgumentException: in must not be null!
    at opennlp.tools.util.model.BaseModel.(BaseModel.java:114)
    at opennlp.tools.sentdetect.SentenceModel.(SentenceModel.java:77)
    at wordprocessor2.WordnTextProcessing.jButton3ActionPerformed(WordnTextProcessing.java:252)
    at wordprocessor2.WordnTextProcessing.access$200(WordnTextProcessing.java:28)
    at wordprocessor2.WordnTextProcessing$3.actionPerformed(WordnTextProcessing.java:92)
    at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
    at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
    at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
    at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
    at java.awt.Component.processMouseEvent(Component.java:6504)
    at javax.swing.JComponent.processMouseEvent(JComponent.java:3321)
    at java.awt.Component.processEvent(Component.java:6269)
    at java.awt.Container.processEvent(Container.java:2229)
    at java.awt.Component.dispatchEventImpl(Component.java:4860)
    at java.awt.Container.dispatchEventImpl(Container.java:2287)
    at java.awt.Component.dispatchEvent(Component.java:4686)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
    at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
    at java.awt.Container.dispatchEventImpl(Container.java:2273)
    at java.awt.Window.dispatchEventImpl(Window.java:2713)
    at java.awt.Component.dispatchEvent(Component.java:4686)
    at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:707)
    at java.awt.EventQueue.access$000(EventQueue.java:101)
    at java.awt.EventQueue$3.run(EventQueue.java:666)
    at java.awt.EventQueue$3.run(EventQueue.java:664)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
    at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
    at java.awt.EventQueue$4.run(EventQueue.java:680)
    at java.awt.EventQueue$4.run(EventQueue.java:678)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:677)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:211)
    at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:128)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:117)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:113)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:105)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

    Reply
    • Wow, that’s a lot of exception code to paste! My guess is that your problem is with loading the model files. Make sure that you downloaded them (see the Getting Started section, above) and put them in a place that your application can load them. Loading resource files is a pretty standard thing that you should be able to find help with on Google if you’re having trouble.

      Reply
  5. Hi thank you for the post. Is it possible to find the accuracy of the tagging from the program? Currently the only way I know is through the command line.

    Reply

Leave a Reply

Your email address will not be published / Required fields are marked *