Algoritmos de Clustering para conjuntos de dados massivos

Via Big Data Central. Potential applications:

  • Creating a keyword taxonomy to categorize the entire universe of cleaned (standardized), valuable English keywords. We are talking of about 10 million keywords made up of one, two or three tokens, that is, about 300 times the number of keywords found in a good English dictionary. The purpose might be to categorize all bid keywords that could be purchased by eBay and Amazon on Google (for pay-per-click ad campaigns), to better price them. This is the application discussed in this article.
  • Clustering millions of documents (e.g. books on Amazon.com) or
  • Clustering web pages, or even the entire Internet, which consist of about 100 million top websites - and billions of web pages.