Powerset: NLP based information extraction and navigation on the web
Powerset released their nlp based search engine/information extraction and gisting engine/information navigation service yesterday. I know—it’s a mouthful but that is the point of this post. They are trying to do a lot of things and succeeding better at some than others. Here is a partial list:
- Query flexibility: Powerset does nlp on your queries so you can ask questions like, “Who are the actors in Pulp Fiction?” This would be a negative feature if it were required but you can also type, “actors pulp fiction.” I haven’t been able to tell if they are doing any synonym checking on the queries.
- Search history: Powerset mines your history for past searches and displays close matches as you type into the search box. This is virtually useless.
What are the chances that I want to search for the exact same thing again? Google makes suggestions based on what everyone searches for and I have come to rely on it as a sort of query tuning. Wouldn’t it be nice to include truly similar queries taking into account semantics and synonyms? The Google experience could be improved but Powerset chose to step backwards. - Auto-tag Cloud: Here Powerset made some interesting improvements to the user experience.
First they use terms found in a document in a tag cloud rather than relying on spotty user generated tags. They also separate nouns and verbs referenced in the document into separate clouds. This has some utility but they currently show too many words and use them only as a way to navigate the information in the document as opposed to information on the web in general. - Gisting: This is where Powerset fails to live up to their hype.
The idea of gisting long documents to produce something that is easily skimable is a powerful idea but they make so many mistakes that the implementation is distracting and of marginal use. Hopefully they will improve this with better tuned nlp n-gram extraction.
Currently Powerset only extracts information from Wikipedia. At some level I wonder why we need that but if you look at the techniques they are using and *imagine* it working across the entire web it would be nice. What disappoints me is that it does no better at finding things and cross-referencing stuff. I could find very few examples of cross-document references in the preview.
These days we hear of many applications of nlp in creating semantic data from unstructured text. This has some great applications but when it comes to finding stuff on the web I don’t need a service that reads single articles for me I’d rather a have service finds related information. That is what I spend most of my time doing while researching things on the web. When planning a trip to Turkey I need information on tickets, hotels, weather, history, news, and not just history but the history of the Byzantine Empire, the Ottomans, Greece, Rome, etc. Why doesn’t a service mine the web for these connections, ones based on related concepts? A service like this would draw perhaps more from categorization technology than raw nlp.
Trackbacks
Use the following link to trackback from your own site:
http://occamsmachete.com/trackbacks?article_id=powerset-nlp-based-information-extraction-and-navigation-on-the-web&day=13&month=05&year=2008