Comparing Calais, Zemanta, and Yahoo! Term Extractor Results

I’ve been looking into semantic web services to extract key terms and concepts from user-generated content.  Calais and Zemanta both offer rich web services, designed to help you find and integrate relevant and related content from around the web.  For my purposes, I’m just interested in the term/concept extraction – which is just a small part of what they provide.  Yahoo! has a much more basic service designed to do just that, appropriately named the Yahoo Term Extraction Service.

I decided to do a quick evaluation/comparison, using the following text, from one my delicious bookmarks:

Online Communities: Establishing a Community’s Culture – Online Community Report

We initiated the Online Community Culture study in October of 2008, as part of the ongoing research agenda of the Online Community Research Network. The intention of the study was to get a broad look at the factors that influence online community culture, and the steps community managers and strategists take in cultivating, and in some cases influencing, a community’s culture. We had over 75 participants in the research, representing many sectors, including software, tech, traditional media, social media and online community, and non-profits. Respondents seniority skewed towards Manager (44%), Directors & VP’s (12%).

The results from each were quite different.  Calais and Zemanta both seem to have more “semantic intelligence” and were able to focus in on the terms that were most relevant to the subject.  Calais offered a short, but all relevant list of terms – all extracted directly from the text.  Zemanta offered a broader set of terms, including some related terms not explicitly in the text, such as “social network” and “community management”.  Unfortunately, it also included some unhelpful terms, such as “computers” and “on the web”.  Yahoo! provided the broadest list of terms, but also the least helpful.  With all the resulting terms extracted directly from the text, Yahoo!’s service seems to be mostly a semantic parser, with the least semantic analysis.  However, Yahoo’s simplicity can be valuable, as well.  With other examples, I’ve seen Calais and Zemanta come up empty (no terms), while Yahoo! provided some relevant, and some not-so-relevant terms.  As with people, too much intelligence can be problematic.  ;-)  Unfortunately, none of the services consistently provide ideal terms.  But combined, you might get decent results.  That’s something I’m continuing to explore.  For those interested, the resulting terms from each service are below.

Calais:

  • Online Community Research Network
  • social media
  • online community culture
  • online community
  • Online Community Culture study

Zemanta:

  • Virtual community
  • Social media
  • Online Communities
  • Computers
  • Non-profit organization
  • On the Web
  • Community Management
  • Social network

Yahoo!:

  • culture study
  • community culture
  • community managers
  • research agenda
  • ongoing research
  • strategists
  • seniority
  • respondents
  • vp
  • intention
  • sectors
  • non profits
  • participants
  • community research network