I research the application of ICT to the power industry. ICT moves and progresses very rapidly, but it does not do so in isolation. Its benefits should be applied to other areas for assistance and further improvement. With the advent of sensor technologies, more equipment and devices in the smart grid are being tagged with inexpensive yet very powerful sensors. Two things are happening. One is that these sensors constantly generate a large amount of data at many different paces, some in real time and others not. The other is that many different kinds of data are produced, some structured and others not. Generated data may be collected and stored for further analysis. Traditional relational databases cannot cope with the velocity, variety, and volume (the 3 Vs) of those Big Data. That triggered the birth of NoSQL. Recently, I attended the NoSQL Now 2013 conference. I wrote a blog that covers some of the sessions I attended. (See here, here, and here.)
I looked on the exhibitors list for a company that specializes in analytics. I was fortunate enough to be able to talk to Acunu. Since talking to Jim Kascade of Infochimps, I have been reading a lot about analytics basics. As I make a little progress in the area of analytics, I realize it encompasses huge areas and there is no such thing as general analytics. For that reason, analytics cannot be described easily in a few words. Now I can appreciate why data science or analytics experts do not want to talk about it in detail to a layman like me.
Changing scene of analytics
The Hadoop type of analytics is conducted in batch. That is fine. There is a place for batch analytics. But what about real-time or streaming analytics? This area is gaining a lot of attention these days. In the power industry, to keep our lights on, they monitor a lot of data coming from many places. Depending on who you are and how wide your responsibilities are, you may look at a small or a large area. Either way, you need to deal asynchronously with many types of data of varying generation frequencies, speeds, and formats. In general, the more data you collect, the better your analysis becomes. Of course, detailed analysis is required beforehand to discern which data are important to collect and analyze. The recent acquisition of Infochimps by CSC is a good indication of that.
I spent a few minutes reviewing their website before the interview and spent about 45 minutes with Tim Moreton (CTO) and Dai Clegg (VP marketing) at the show.
From left: Tim Moreton and Dai Clegg
After the interview, I realized that I would need far more time to write about them and their technologies. So I spent some time going through their and others’ materials to digest what was discussed during the interview.
So the rest of the blog is what I found out about their technologies from multiple sources, including the interview.
Every company claims their solutions are unique and better than their competition’s. OK, let’s talk about what their differentiations are.
I understand that no two analytics solutions are alike. Like real-time, streaming analytics is an overloaded term. Several companies, like Grok Solutions, provide a version of streaming analytics. How does Acunu differentiate itself from others? Their blog site is a great source of information, but I would like to read comprehensive white papers. Tim and Dai told me that such papers would be coming over time.
These are their points of differentiation:
- Real-time analytics
- Cassandra integration
- Dashboard and architecture
Although they did not tell me that real-time analytics itself is part of the differentiation, I think it useful to know how they define real time. Tim’s blog is a good source for this. It certainly does not mean “hard real time” as defined here. Tim quoted Doug Cutting (creator of Hadoop and chief architect of Cloudera):
It’s when you sit and wait for it to finish, as opposed to going for a cup of coffee or even letting it run overnight…. That’s “real time.”
Acunu’s real time is “API real time,” as explained by Tim:
Acunu’s “API real time” works for operational intelligence and monitoring. Not only are results returned interactively, within the latency of a web page refresh or typical API call, but the analytics is incremental and continuous — the numbers themselves are fresh. But more than this, the fresh data in queries is combined with the historic data so trends, exceptions, comparisons are all immediately detectable.
Acunu’s cubes are very similar , except that they are computed continuously, as each new event is ingested, and incrementally, so that only a small amount of work has to be done for each new event.
In the Lambda architecture, Nathan Marz puts a precomputed view between an application and the data it accesses. With a precomputed view, an application can have access to all the data that have been accumulated and analyzed, plus data just arrived. A cube is a set of aggregated results derived by applying some formula to multidimensional data that have been collected to a certain point. It could be the simple sum of a certain type of data, such as the count of goods sold so far by region or by time period (week, month, or year). Because the result so far is already available, the newly arrived relevant data can easily be added to update the count information. We can have multiple cubes (views), depending on the kind of queries desired, as the following figure shows.
Note that this is one of the ways to implement the Lambda architecture and there are other ways to implement it. However, I think this helps us grasp the idea.
Tim mentioned to me as well as wrote in his blog that they chose Cassandra because:
Cassandra excels in scalability, performance, multi-data-center support, and a multi-master architecture (no single point of failure).
However, he continued:
Without exception [most of our customers] were finding Cassandra data modeling to be the steepest part of the learning curve.
[I]ts API, even with the assistance of Cassandra Query Language (CQL), is spartan. It offers only the most basic building blocks to work with. Developers have to think carefully about how data will be read and carefully plan their schemas accordingly. If new features demand new ways of reading data, those changes can be very hard to implement.
Because I am not a Cassandra user or developer, I cannot comment on this as an expert. But in general, an open source solution is great in its openness and flexibility, and free of charge (as long as you comply with its licensing terms). Its problems are lack of usability and support. Yes, there is usually a helpful community that can give you a hand when you need one. But it is not always easy to put the open source solution in among your enterprise solutions. That is why many open source solutions come with a business version. In the case of Cassandra, Datastax provides a business version with support.
My understanding is that they put a layer, including their version of SQL like Acunu query language, (AQL), on top of vanilla Cassandra to make it easier to work with. Because there may be multiple versions of Cassandra (because it is open source), Tim said they deal with the major ones. Acunu works closely with the Apache community to provide updates and upgrades to the Cassandra project.
Dashboard and architecture
Visualization is important in understanding what’s going on with our Big Data and helps us deal with the Cassandra database. The entire Acunu architecture is shown in Tim’s presentation figure, as follows.
Acunu’s analytics architecture (Source: Tim Moreton’s presentation at NoSQL Now 2013)
Acunu Analytics system can take multiple data streams from various sources simultaneously. The streams can be combined with Flume or Storm in addition to HTTP.
Future work for Acunu
Tim and Dai said that their current analytics functions are basic, such as detecting out-of-bound values and the sum of particular data segments. It would be nice to have more sophisticated analytics that could predict the future. Such predictions would help the power grid to be reliable and stable and keep our lights on.
It is good to find that someone like Acunu is working on streaming/real-time analytics. Each analytics company provides its unique combination of many different pieces. I wonder how this market will evolve? Will the market stay fragmented or become consolidated? Most companies with analytics solutions would like to apply this to areas like the power industry. It is safe to say that the analytics segment has been dominated by SNS and enterprise applications. It is time to apply it to the nation’s infrastructures, such as the power grid.