Machine learning in practice with Spark MLlib (2016)
2026 Apr 30Machine learning in practice with Spark MLlib (2016)
In 2016, I had the pleasure of presenting at Strata + Hadoop World Singapore alongside J.P. Eiti Kimura. Our talk, titled “Machine learning in practice with Spark MLlib: An intelligent data analyzer”, focused on the practical aspects of building and deploying machine learning models at scale using Apache Spark.
The Problem: Revenue Leakage
We discussed a real-world application called Watcher-AI, which we developed at Movile. The goal was to detect and prevent revenue leakage—a common but complex problem where system errors or fraud lead to significant financial loss.
Technical Implementation
The project was built using a polyglot approach to leverage the best of the Spark ecosystem:
- Scala: Used for the training phase (
watcher-trainer), where we compared different models and algorithms. - Java: Used for the production environment (
watcher-ai-samples), providing a robust and fast prediction layer. - Spark Notebook: Facilitated interactive data exploration and initial prototyping.
One of the key takeaways was the importance of transitioning from experimental notebooks to a production-ready codebase, ensuring that models could handle data streams in real-time.
Watch the Presentation
You can watch the full presentation below:
Resources
- O’Reilly Video: Strata + Hadoop World 2016 - Machine learning in practice with Spark MLlib
- GitHub Repository: flavioclesio/spark-mllib-sample
- Watcher-AI Samples: fclesio/watcher-ai-samples