Archive

Posts Tagged ‘nlp’

Some Java NLP Solutions

While I haven’t had time to check out any of these solutions for doing Natural Language Processing (not Neural Linguistic Programming), I have noticed that every one of these except TreeTagger seems to use a maximum entropy model for, essentially, randomly tagging a sentence and seeing what order of works maps best (scores highest) using a beam search.  Yes, that is a very very simplified statement of the algorithm, but you could click that nice link and read the WikiPedia entry for it if you are really curious.

Fascinating stuff which I really would love to spend more time in. Ironically my time is being taken up as of late by computer language programming in Scala using their parsing.combinator API (which, if you haven’t had time to check out, spend a day or two building a small external DSL with it — it’s not bad at all).

Regardless, here is a list of Java NLP code bases. One final comment, MorphAdorner looks absolutely facinating. It may only be in the packaging or presentation, but I would love an excuse to work with their tool set. Enough delay; The list!

  1. MorphAdorner – adorn text with information about it.
  2. Monk – targetting humanities researchers offering pattern finding tools for collections of texts.
  3. TreeTagger – fast, just fast (at least according to a post by Matthew Wilkins.
  4. Log-Linear Part of Speech Tagger (Stanford)
  5. LingPipe – looks interesting, especially with some of the details/features given attention like re-learning on the fly.
  6. OpenNLP
Categories: Uncategorized Tags: , ,