Training Software

Posted by pat Mon, 07 Jul 2008 15:52:00 GMT

Lately I’ve been thinking about my job, the fate of AI, and a recent article in Wired.  I am the CTO at a startup company that is using text and data mining techniques and natural language processing to help people manage the information they touch.  We are trying to automatically make connections between things that you have to manage yourself with today’s software. 

AI started with great hype back in the ‘80s trying to model human thought in the algorithms.  For instance Natural Language Processing tried to model human grammar in the NLP algorithm.  This technique had limited success and the failure of this type of approach led to the long night of AI.  Recently new algorithms have been discovered that must be trained from a large corpus of documents.  The algorithm has two stages; the training phase where you feed in tagged documents (called a gold standard) and the extraction phase where the algorithm uses the model generated in the training phase to tag untagged documents.  So if you have lots of email tagged for places a “person” is mentioned then the algorithm will be able to pick out people from untagged email.  It has learned language without having any model of grammar—in fact in the above example the language was not important, it would work in any textual language regardless of the grammar.

AI research has led us to a new flowering of the technology through machine learning.  We still make no claim that massively dimensional vector spaces are the way humans learn but they work for the digital processors we have today and they can learn things that are useful about the human experience.  This brings me to the article by Chris Anderson of Wired.  Chris falls prey to the mainstream journalist’s cliché by exaggerating for attention but his article, “The End of Theory,” does make you think.  We are entering a time when the cool new algorithm is being replaced in importance with the awesome new data set.  It also occurred to me that there is one entity in the universe with just about all the data—Google.

Which brings me to my job:  Train some algorithms with a new data set before Google thinks of it and apply it to a unique customer problem.  Not exactly what I had in mind when I started my career but it does add a detecting angle that you have to like.

 

Desktop Linux: Ubuntu Pronounced, "Pretty Cool"

Posted by pat Sun, 03 Feb 2008 03:28:00 GMT

My wife has a laptop that is getting a little old. It runs Windows XP and she uses it mostly for browsing and email. Lately it has gotten really slow. Some disk access silliness is killing the OS. I could troubleshoot the thing or…

A year ago I had an old laptop that had been running its native Windows XP for several years until it got too slow. I tried defraging the disk and other tricks to no avail. I probably could have sussed it out but, you know, I have better things to do. Like installing Ubuntu 7.04 Feisty Fawn on it which was fun, in a perverse sort of way, and gave me a screaming fast machine with most of the 40G drive free. On the other hand the wifi was intermittent and tended to be hard to reestablish once it was dropped. Also It was difficult to get my bluetooth mouse to pair up automatically. I hacked the scripts for it but, you know, I have better things to do so I even though I could get it to work it wasn’t very reliable.

To make a long story longer, along came Ubuntu 7.10 Gutsy Gibbon. One day I saw the “Upgrade” in the desktop version of the software updater. I should say that one of the greatest things about Ubuntu is its use of the old debian apt-get mechanism to keep the system up to date. When security or bug fixes come along they are automatically made available to all users. The download and installation is automatic too. This even works for upgrades of the OS so the Upgrade Manager was offering me the nifty new 7.10 (they try to do an upgrade twice a year).

I did the upgrade with hopes that it would imporve some of the rough edges of 7.04. It took many hours of download and configure. The upgrade asks you to edit conflicting config files, which would probably scare a casual user, but I got through these ok. When I was done the wifi worked flawlessly and my bluetooth mouse paired automatically and connected instantly when I turned it on (even faster than my Macbook Pro).

I pimped it out with a nice theme for the new Compiz-Fusion window manager complete with some slick Mac-ish icons and had a pretty sweet machine with plenty of free disk and all the bells and whistles. Back to my wife who had finally given up using her laptop. One night she sees me playing with my old machine and says, “That looks pretty cool.” I set up Thunderbird to get mail from her account and she hasn’t used her old machine since.

I guess I have to install Ubuntu on her old machine now since mine has been taken over…

Usability Be Damned: Apple takes us beyond usability.

Posted by pat Thu, 12 Jul 2007 02:19:00 GMT

I have to pay more attention to things.  Every once in a while the world changes when I’m not looking.  Sure, I noticed that Apple was on a roll.  I own three Apple machines and covet another.  That is up from zero a couple years ago.  I know a lot of you are experiencing something similar but I’m in the industry and am therefore supposed to notice such things or better yet predict them.  OK, in my own defense I did see this coming but until the iPhone I didn’t really get why it was happening and why it changes everything.

Apple has long held itself up as an example of style and usability.  Some years this was justified and others it was not.  Apple inserted the iPod into the world of fashion.  It was the first Apple purchase I had made in many years.  But there is something even more significant happening at Apple, perhaps a more lasting insight than simply tuning hardware into a fashion accessory.  Apple has also long held itself above others with their vaunted usability.  When I bought my first iPod I looked at it long and hard to see where the usability innovations were.  I was rather disappointed, it was kind of hard to use, especially in my car.  But I had to admit it was fun.  And this, I believe, is their big idea.  Products you use for fun should be fun to use.  You should desire to pull the object out because it fells good in your hand, it make you look good, it is fun to fiddle with, it sucks you in.  Apple took usability one better.  In a day when more and more of our discretionary spending is targeted towards music, TV, entertainment, fashion—fun, anyone that doesn’t deliver fun is going to be left out.  Usability isn’t good enough anymore, only productivity products can rest at being usable.  

Usability be damned—products you buy for fun should be fun to use.

Update: I’m collecting iPhone stories in this trail.

Update:

I have owned my iPhone now for three months and am still quite happy with it. I find it interesting that a device with such serious drawbacks is so likeable. Apple has pulled various fast ones on their users like requiring that you plug their earbuds into the thing. My nice Shure phones need to be carved down to fit into the jackhole. Also my car connector, the one I use for my older iPod, doesn’t work on my iPhone. I’m not sure who’s fault this is but ultimately Apple must take responsibility. Also why can’t I sync via bluetooth or wifi? I hate the silly cords you have to carry around everywhere. I assume this last will be solved by future software upgrades but I’m impatient.

Don’t get me wrong. I still love the thing. Like a newborn it has its crappy moments even though it is beautiful and shows such potential.

Click One Button Twice

Posted by pat Wed, 11 Jul 2007 02:17:00 GMT

Recently I’ve been putting a lot of thought into making a complex thing simple. I work at Trailfire where we are doing something that is new to most people—namely letting people annotate and cross-reference the Web. I have posted about it before, about the power of the idea but this time I’d like to talk about how focused we need to be on simplicity and elegance. The problem is to deliver power and a rich feature set in a way that users will be able to digest. There are several principals to modern UI design that apply.

  • Deliver the most commonly used functionality in a very streamlined way. Count number of clicks, make sure the graphics are self explanatory and simplify, simplify, simplify.
  • Do not present the user with features that are hard to understand in a way where they think they must. It is easy to make the mistake of trying to explain everything to a user all at once—forcing them to drink from the fire hose.
  • Present power features to be discovered at the leisure of the user.
  • Use defaults to give the user with the most likely configuration. Configuring a rich application can be very complicated. Why not give the user reasonable defaults to start with and let them fine tune things later when they are more comfortable with things.
  • Do as many things for the user as possible with the smallest amount of interaction.

one-button-twice.gifYou use the Trailfire browser extension to create trails of marked pages, something like bookmarks on a related subject. Once you have installed the extension in IE or Firefox you simply find a page by browsing or searching and click the Mark button. Something like a sticky note appears on the page where you can simple click save and be done. Then the next time you find a page you’d like to mark just hit that button again.

By clicking one button twice you have created a trail. This is literally all you have to do. But what we did for the user is far from simple. We:

  • Put the marks in a sidebar so they can be found easily.
  • We put links in each mark to reference the other so the two marks form a trail.
  • We created a Trail Summary page back on the Trailfire site to act as a quick reference to the entire trail.
  • We crawled the trail to see what keywords apply to the marked pages and put them on the trail summary page. We call these auto tags since they deliver much of the benefit of tags without forcing the user to spend time categorizing things.
  • We put the auto tags in a cloud ranked by frequency of use and linked to other trails with those terms.
  • We store large thumbnails of the pages that we put on the trail summary page.
  • We searched the entire database of trails to find similar ones and put them on the summary page. This is done using a sophisticated algorithm based on all of the key auto tags from the pages marked not just the comments left in the marks so even the simple trail example above works.
  • We made the trail summary page indexable by Google so literally anyone can benefit from your research.
  • We made the trail and marks available for comment by other users.
  • We put your trail in the similar trails list on other summary pages.
  • We notified anyone on Trailfire that has made you a contact that you have created a trial.

    The user probably has no idea that most of this was done. All they know is that they marked some pages and they can find those again easily. Everything else we did they will find as they discover more about the system. Also most of the choices we made (like making the trail public) can be easily changed and the default can be changed. They will discover how to do this as they find a reason to. They probably don’t know that their trail was integrated into a growing user generated knowledgebase. These will be pleasant discoveries and all because the user…

    … clicked one button twice.

LimeWire: or the Return of Napster

Posted by pat Wed, 11 Jul 2007 02:15:00 GMT

The first version of Napster allowed anyone on the internet to share music with anyone else. The music industry killed Napster because, they said, it violated their copyrights on digital music. Digital Rights Management (DRM) was conceived to protect copyrights to digital media. iTunes, Napster V2, and others were created to fill the need for music downloads that protect the rights of the music publishers.

But the demand for open file sharing didn’t go away. In some ways it has only strengthened over the years since the victory of the music industry over Napster. Witness the growth of programs like LimeWire. While LimeWire accommodates DRM one has only to look at the content available to see that most of it is in MP3 format and lacks license information.

To get a quick overview of LimeWire follow this trail I found on Trailfire.