Streaming Analytics Applied to the Power Industry

I research the application of ICT to the power industry. ICT moves and progresses very rapidly, but it does not do so in isolation. Its benefits should be applied to other areas for assistance and further improvement. With the advent of sensor technologies, more equipment and devices in the smart grid are being tagged with inexpensive yet very powerful sensors. Two things are happening. One is that these sensors constantly generate a large amount of data at many different paces, some in real time and others not. The other is that many different kinds of data are produced, some structured and others not. Generated data may be collected and stored for further analysis. Traditional relational databases cannot cope with the velocity, variety, and volume (the 3 Vs) of those Big Data. That triggered the birth of NoSQL. Recently, I attended the NoSQL Now 2013 conference. I wrote a blog that covers some of the sessions I attended. (See here, here, and here.)

I looked on the exhibitors list for a company that specializes in analytics. I was fortunate enough to be able to talk to Acunu. Since talking to Jim Kascade of Infochimps, I have been reading a lot about analytics basics. As I make a little progress in the area of analytics, I realize it encompasses huge areas and there is no such thing as general analytics. For that reason, analytics cannot be described easily in a few words. Now I can appreciate why data science or analytics experts do not want to talk about it in detail to a layman like me.

Changing scene of analytics

The Hadoop type of analytics is conducted in batch. That is fine. There is a place for batch analytics. But what about real-time or streaming analytics? This area is gaining a lot of attention these days. In the power industry, to keep our lights on, they monitor a lot of data coming from many places. Depending on who you are and how wide your responsibilities are, you may look at a small or a large area. Either way, you need to deal asynchronously with many types of data of varying generation frequencies, speeds, and formats. In general, the more data you collect, the better your analysis becomes. Of course, detailed analysis is required beforehand to discern which data are important to collect and analyze. The recent acquisition of Infochimps by CSC is a good indication of that.

Acunu Analytics

I spent a few minutes reviewing their website before the interview and spent about 45 minutes with Tim Moreton (CTO) and Dai Clegg (VP marketing) at the show.

From left: Tim Moreton and Dai Clegg

After the interview, I realized that I would need far more time to write about them and their technologies. So I spent some time going through their and others’ materials to digest what was discussed during the interview.

So the rest of the blog is what I found out about their technologies from multiple sources, including the interview.

Differentiation

Every company claims their solutions are unique and better than their competition’s. OK, let’s talk about what their differentiations are.

I understand that no two analytics solutions are alike. Like real-time, streaming analytics is an overloaded term. Several companies, like Grok Solutions, provide a version of streaming analytics. How does Acunu differentiate itself from others? Their blog site is a great source of information, but I would like to read comprehensive white papers. Tim and Dai told me that such papers would be coming over time.

These are their points of differentiation:

  1. Real-time analytics
  2. Cube
  3. Cassandra integration
  4. Dashboard and architecture

Real-time analytics

Although they did not tell me that real-time analytics itself is part of the differentiation, I think it useful to know how they define real time. Tim’s blog is a good source for this. It certainly does not mean “hard real time” as defined here. Tim quoted Doug Cutting (creator of Hadoop and chief architect of Cloudera):

It’s when you sit and wait for it to finish, as opposed to going for a cup of coffee or even letting it run overnight…. That’s “real time.”

Acunu’s real time is “API real time,” as explained by Tim:

Acunu’s “API real time” works for operational intelligence and monitoring. Not only are results returned interactively, within the latency of a web page refresh or typical API call, but the analytics is incremental and continuous — the numbers themselves are fresh. But more than this, the fresh data in queries is combined with the historic data so trends, exceptions, comparisons are all immediately detectable.

Cube

Acunu’s Cube is explained in Tim’s blog and is akin to an online analytical processing (OLAP) cube. But Tim says the difference is that:

Acunu’s cubes are very similar , except that they are computed continuously, as each new event is ingested, and incrementally, so that only a small amount of work has to be done for each new event.

In the Lambda architecture, Nathan Marz puts a precomputed view between an application and the data it accesses. With a precomputed view, an application can have access to all the data that have been accumulated and analyzed, plus data just arrived. A cube is a set of aggregated