Machine learning in practice with Spark MLlib (2016)

Machine learning in practice with Spark MLlib (2016)

In 2016, I had the pleasure of presenting at Strata + Hadoop World Singapore alongside J.P. Eiti Kimura. Our talk, titled “Machine learning in practice with Spark MLlib: An intelligent data analyzer”, focused on the practical aspects of building and deploying machine learning models at scale using Apache Spark.

The Problem: Revenue Leakage

We discussed a real-world application called Watcher-AI, which we developed at Movile. The goal was to detect and prevent revenue leakage—a common but complex problem where system errors or fraud lead to significant financial loss.

Technical Implementation

The project was built using a polyglot approach to leverage the best of the Spark ecosystem:

  • Scala: Used for the training phase (watcher-trainer), where we compared different models and algorithms.
  • Java: Used for the production environment (watcher-ai-samples), providing a robust and fast prediction layer.
  • Spark Notebook: Facilitated interactive data exploration and initial prototyping.

One of the key takeaways was the importance of transitioning from experimental notebooks to a production-ready codebase, ensuring that models could handle data streams in real-time.

Watch the Presentation

You can watch the full presentation below:

Resources