Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used the Stanford NLP library extensively for NER. I made heavy use of it in my senior thesis project.

It's pretty straightforward to use their library to read a document and output an XML file containing NER data (and lots of other fun stuff).

For instance, from the sentence:

> World War II, or the Second World War (often abbreviated as WWII or WW2), was a global military conflict lasting from 1939 to 1945, which involved most of the world's nations, including all of the great powers, eventually forming two opposing military alliances, the Allies and Axis.

Stanford NLP NER will output the following entities:

"World War II" - MISC "Second World War" - MISC "1939 to 1945" - DATE - NORMALIZED 1939/1945 "Axis" - MISC

You can view the output of Stanford's CoreNLP library (NER + dependency grammar + coreference resolution + some other stuff) for the Wikipedia article on World War II in my github repo:

https://raw.github.com/ryantanner/thesis/master/data/ww2samp...

edit: I should add that the real fun (for me) came from combining NER with dependency grammars and coreference resolution. It makes it very easy to turn Stanford NLP's output into a knowledge graph combining a large number of documents.



For those who want to play around with dynamic output: http://nlp.stanford.edu:8080/parser/

This is a bit more human friendly.


Here is the whole Stanford CoreNLP suite with visualisation output: http://nlp.stanford.edu:8080/corenlp/ It helps me greatly when it comes to interpreting the dependency structure.

If you need an example sentence: "Stanford University is located in California. It is a great university."

I also know that Microsoft Research has a demo online of their NLP tools: http://msrsplatdemo.cloudapp.net/ (Silverlight required) I don't think you can download the tools though, but they do offer to provide you with an API token to call their service from their cloud.

Potential conflict of interest: I wrote parts of the CoreNLP visualiser.

Edit: Added example sentence.


http://www.nerily.com/#demo -> Or this with a JSON API output.


I had experimented with NLTK, CoreNLP, OpenNLP etc and when it came it NEP extraction, I felt NLTK does the better job (none of them were anywhere close to perfect/dependable), but NLTK had a lot more dictionaries to choose from and overall better. We use a highly customized/overhauled NLTK for our apps Iris(siri for Android) and Friday for Android.


NLTK may very well be better. I only compared OpenNLP and Stanford because I was implementing my thesis in Scala and I wanted a library running on the JVM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: