IBM Watson Meetup—Part 4: Machine Learning

When I studied artificial intelligence as part of my computer science degree program, I did not have the opportunity to study machine learning (ML). Yes, it was way back and I did not learn ML formally. I once started to watch Professor Andrew Ng’s online course at Stanford University but gave it up. It is about time for me to get back to it.

IBM is not the only company working on ML. Also at work are Amazon (Amazon Machine Learning), Google (Prediction API, TensorFlow), and Microsoft (Microsoft Azure Machine Learning). I will write about their offerings in the future. By the way, there are many more ML offerings from startups and small and medium-size companies as well. See Venture Scanner for more vendors for the AI/ML vendors list. Also, Dr. Adrian Bowles reported on promising companies in the AI area in his StormInsights newsletter (February/March 2015 issue).

IBM Watson Meetup (Machine Learning)

High Level Overview

The fourth speaker at the IBM Watson meetup was Dr. Shivakumar Vaithyanathan, Fellow, Watson Cognitive Services, IBM Watson. He discussed Watson declarative ML.

Dr. Shivakumar Vaithyanathan

I like the last line on his next slide, “The only constraint is the creativity of the ML Scientist.”

ML can employ many algorithms and can be described from many different viewpoints, such as which algorithms are used (conventional vs. your own). Vaithyanathan discussed it in conjunction with a distributed system (backend), such as Spark. ML algorithms work fine, as long as scalability is under control. But as more data need to be dealt with, scalability becomes a big issue. An ML algorithm that worked before may not work when data increase excessively.

Because data scientists are not trained to be experts in distributed systems, they need an easy (transparent) way to take care of higher scalability without having to know exactly how backends work.

Vaithyanathan showed this situation in the following slide. A data scientist designs and implements an effective ML for a specific need and writes a simple script in R-like or Python-like language to take care of complex backends processing. With this support, a data scientist can concentrate on his work for ML algorithm development.

Figure 1: ML and backends are interfaced with a simple interface language

In this slide, three backends are shown: Hadoop, Spark, and Open Message Passing Interface (MPI). (By the way, this article compares Hadoop and Spark.) With this interface in place, a set of necessary configurations is automatically created and pushed to the backend systems to take care of their processing properly.

During the talk, Vaithyanathan touched a little bit more on the technical side, but I’ve omitted that. I do not know whether IBM published his presentation, but I found a similar presentation by him, given in August this year, that addresses the same thing here.

More Details

If you have more time and want to dig in a little deeper, watch this talk by Fred Reiss, Research Staff Member. He now belongs to IBM Spark Technology Center in San Francisco.

Open Sourced ML

Finally, Vaithyanathan mentioned that IBM recently made its machine learning package, SystemML, open source in June this year. On November 2, it was accepted as an open source project by the Apache Incubator. More information can be found here and here.

Zen Kishimoto

About Zen Kishimoto

Seasoned research and technology executive with various functional expertise, including roles in analyst, writer, CTO, VP Engineering, general management, sales, and marketing in diverse high-tech and cleantech industry segments, including software, mobile embedded systems, Web technologies, and networking. Current focus and expertise are in the area of the IT application to energy, such as smart grid, green IT, building/data center energy efficiency, and cloud computing.

, , ,

No comments yet.

Leave a Reply