Powerset: NLP based information extraction and navigation on the web
Powerset released their nlp based search engine/information extraction and gisting engine/information navigation service yesterday. I know—it’s a mouthful but that is the point of this post. They are trying to do a lot of things and succeeding better at some than others. Here is a partial list:
- Query flexibility: Powerset does nlp on your queries so you can ask questions like, “Who are the actors in Pulp Fiction?” This would be a negative feature if it were required but you can also type, “actors pulp fiction.” I haven’t been able to tell if they are doing any synonym checking on the queries.
- Search history: Powerset mines your history for past searches and displays close matches as you type into the search box. This is virtually useless.
What are the chances that I want to search for the exact same thing again? Google makes suggestions based on what everyone searches for and I have come to rely on it as a sort of query tuning. Wouldn’t it be nice to include truly similar queries taking into account semantics and synonyms? The Google experience could be improved but Powerset chose to step backwards. - Auto-tag Cloud: Here Powerset made some interesting improvements to the user experience.
First they use terms found in a document in a tag cloud rather than relying on spotty user generated tags. They also separate nouns and verbs referenced in the document into separate clouds. This has some utility but they currently show too many words and use them only as a way to navigate the information in the document as opposed to information on the web in general. - Gisting: This is where Powerset fails to live up to their hype.
The idea of gisting long documents to produce something that is easily skimable is a powerful idea but they make so many mistakes that the implementation is distracting and of marginal use. Hopefully they will improve this with better tuned nlp n-gram extraction.
Currently Powerset only extracts information from Wikipedia. At some level I wonder why we need that but if you look at the techniques they are using and *imagine* it working across the entire web it would be nice. What disappoints me is that it does no better at finding things and cross-referencing stuff. I could find very few examples of cross-document references in the preview.
These days we hear of many applications of nlp in creating semantic data from unstructured text. This has some great applications but when it comes to finding stuff on the web I don’t need a service that reads single articles for me I’d rather a have service finds related information. That is what I spend most of my time doing while researching things on the web. When planning a trip to Turkey I need information on tickets, hotels, weather, history, news, and not just history but the history of the Byzantine Empire, the Ottomans, Greece, Rome, etc. Why doesn’t a service mine the web for these connections, ones based on related concepts? A service like this would draw perhaps more from categorization technology than raw nlp.
Net Neutrality Redux
It looks like another go around in the net neutrality debate. Comcast announced today, “a 54 percent rise in fourth-quarter net profit to $602 million, or 20 cents per share, from $390 million, or 13 cents per share, a year earlier. “ At the same time they defend their practice of restricting the bandwidth used for certain practices that they find questionable [1]. This comes at a time when the congress is debating a new try at legislating net neutrality.
Is there anything wrong with Comcast throttling bittorent traffic? After all it is traffic that primarily steals revenue from Comcast itself. I mean if you download a movie and watch it rather than pay the PPV fee to Comcast it would be like, oh I don”t know, like the phone company (DSL) letting people use Skype. The fact that in both cases the pipe owners are getting paid regardless of the use of their pipe makes the question interesting. Do we have the right to demand unrestricted access using their pipes? Of course we do and this is my concern. The real issue here is that consumers have the right to know what they are paying for and to have a choice. This is the essence of the free market. If we know that Comcast is choking traffic for their own reasons we might just choose DSL. The legislation being considered would make it illegal for Comcast to throttle based on traffic type. That would keep a level playing field but seems counter to the principal of allowing companies to choose the features of their own products.
If we assume that capitalism is basically sound then how do we put market forces to work to solve this problem. First we would need to make the restrictions on access transparent to users so they can exercise their choice. Then we would need to have a choice. Fortunately in most places in the US we can get high speed internet access from either cable or DSL. We can vote with our money. Congress should protect our interests until such time as market forces take over and we can choose what is in our best interest for ourselves. Maybe we should be looking at breaking up the duopoly of one cable company and one phone company rather than regulating internet access. I know, that seems a bit much to ask for but it does make sense doesn‘t it?
LimeWire: or the Return of Napster
The first version of Napster allowed anyone on the internet to share music with anyone else. The music industry killed Napster because, they said, it violated their copyrights on digital music. Digital Rights Management (DRM) was conceived to protect copyrights to digital media. iTunes, Napster V2, and others were created to fill the need for music downloads that protect the rights of the music publishers.
But the demand for open file sharing didn’t go away. In some ways it has only strengthened over the years since the victory of the music industry over Napster. Witness the growth of programs like LimeWire. While LimeWire accommodates DRM one has only to look at the content available to see that most of it is in MP3 format and lacks license information.
To get a quick overview of LimeWire follow this trail
I found on Trailfire.
Net Neutrality
The term Net Neutrality
is being used to describe the idea that packets on a network should be delivered with equal speed. In other words data comes from OccamsMachete.com just as fast as it comes from Google.com. This has been the norm for the entire history of the internet. As bandwidth increases it is available to all content providers equally. Now some network providers have noticed that they have virtual monopolies on access to the internet. Almost everyone gets high speed interenet access through their telco (DSL) or cable company. This duopoly has hit on the bright idea to charge extra for the speed of access to some content.
One way they can do this is to charge you the consumer for fast access to certain sites. Think $10 for basic internet plus $10 for premium internet with “faster” Google and MySpace. I don’t object to this so much because it is at least visible to the consumer. I know what I am paying for. But the other more insidious model
for a two-tiered internet is where the content provider pays to get access to the consumer. In this case the consumer pays $10 for high speed internet but what they don’t know is that some content is being slowed down because the content provider is not paying the network provider for their top speed. This makes a lie of the network providers speed claims and the consumer has very little way of knowing this.
Regulation has been proposed and debated in Congress to guarantee Net Neutrality but I am skeptical or regulation in general. Regulation often just distorts and convolutes, making it more difficult to find a natural balance. Two things are needed to make this work for network provider, consumer, and content provider:
- Speed fees must be clear to those making purchasing decisions, i.e. the consumer.
- The consumer must have a real choice in what network provider they use. This is a big one. The telecom deregulation act in the 90s didn’t really give us a competitive landscape
. Until this is fixed you will have only one choice—telco or cable. This is too great a limitation on competition.
If the duopoly is allowed to pass on fees for speed to content providers, in the opaque way they are proposing, consumers will have no way to know what they are really getting.