When I studied artificial intelligence as part of my computer science degree program, I did not have the opportunity to study machine learning (ML). Yes, it was way back and I did not learn ML formally. I once started to watch Professor Andrew Ng’s online course at Stanford University but gave it up. It is about time for me to get back to it.
IBM is not the only company working on ML. Also at work are Amazon (Amazon Machine Learning), Google (Prediction API, TensorFlow), and Microsoft (Microsoft Azure Machine Learning). I will write about their offerings in the future. By the way, there are many more ML offerings from startups and small and medium-size companies as well. See Venture Scanner for more vendors for the AI/ML vendors list. Also, Dr. Adrian Bowles reported on promising companies in the AI area in his StormInsights newsletter (February/March 2015 issue).
IBM Watson Meetup (Machine Learning)
High Level Overview
I like the last line on his next slide, “The only constraint is the creativity of the ML Scientist.”
ML can employ many algorithms and can be described from many different viewpoints, such as which algorithms are used (conventional vs. your own). Vaithyanathan discussed it in conjunction with a distributed system (backend), such as Spark. ML algorithms work fine, as long as scalability is under control. But as more data need to be dealt with, scalability becomes a big issue. An ML algorithm that worked before may not work when data increase excessively.
Because data scientists are not trained to be experts in distributed systems, they need an easy (transparent) way to take care of higher scalability without having to know exactly how backends work.
Vaithyanathan showed this situation in the following slide. A data scientist designs and implements an effective ML for a specific need and writes a simple script in R-like or Python-like language to take care of complex backends processing. With this support, a data scientist can concentrate on his work for ML algorithm development.
Figure 1: ML and backends are interfaced with a simple interface language
In this slide, three backends are shown: Hadoop, Spark, and Open Message Passing Interface (MPI). (By the way, this article compares Hadoop and Spark.) With this interface in place, a set of necessary configurations is automatically created and pushed to the backend systems to take care of their processing properly.
During the talk, Vaithyanathan touched a little bit more on the technical side, but I’ve omitted that. I do not know whether IBM published his presentation, but I found a similar presentation by him, given in August this year, that addresses the same thing here.
If you have more time and want to dig in a little deeper, watch this talk by Fred Reiss, Research Staff Member. He now belongs to IBM Spark Technology Center in San Francisco.
Open Sourced ML
Finally, Vaithyanathan mentioned that IBM recently made its machine learning package, SystemML, open source in June this year. On November 2, it was accepted as an open source project by the Apache Incubator. More information can be found here and here.