Recently, I watched the Soft Grid conference, put out by GreentechMedia via Ustream, and was pleasantly surprised that many smart grid and utilities people talked about Big Data and cloud computing. Then I went to the 2012 NoSQL Now conference, where I interviewed five companies and sat in on several of the sessions there. I will post a blog for each interview later. For now, let me describe my understanding of what NoSQL is and how it may be applied to the energy business.
I consulted for MySQL before and knew something about the relational database market. But after it was bought by Sun, I stopped following it. I knew there was such a thing as NoSQL but initially thought it was “No to SQL”; it is more like “Not only SQL.” NoSQL started to get attention circa 2009, and the NoSQL Now conference was only started in 2011. So it is a relatively new area and, as in any new area, the market is very confused. Many terminologies and acronyms are floating around, with many claims by vendors. Quite frankly, it is very, very hard to walk through this market without getting totally confused. Prior to attending the conference, I studied the companies I planned to interview and read anything and everything I could put my eyes on. The sad reality was that I was further confused.
What is NoSQL, technology-wise, component-wise, and application-wise?
The NoSQL market can be described in a few ways. One way is to categorize it by the technologies used. The 451 Group’s Matt Asllet, in his blog NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases, gave a pretty good picture of the market, with categories and the vendors who belong to each category.
Matt Asllet’s database categories
This figure alone is very valuable. This figure helped me to understand where my interviewees’ companies fall in.
This view is great, but I was still not comfortable enough to say, “Yes, I got it.” Bob Wiederhold, president and CEO of Couchbase, made it much simpler for me.
He thinks NoSQL is playing in a segment that is not concentrated on by NoSQL players that are good at transactions or suitable for backoffice applications. He further classified NoSQL into four categories:
- Key value
- Column family
The current Couchbase (1.8) belongs to the key-value camp but will move to the document camp at its 2.0 version launch. He also told me that the key-value and document camps are being merged and the combined camp will be the biggest of the three new categories. I plan to write about his interview in a future blog.
How they fit together in the enterprise
How do NoSQL technologies fit into the enterprise? William McKnight, of McKnight Consulting Group, presented a keynote speech titled, “Putting NoSQL in its Place—in the Enterprise.”
One of his slides shows really well how data is collected, aggregated, and analyzed in the enterprise, and which components are there for each function. Data are collected for analysis; otherwise, there is no reason to collect them. There are two major groups for analysis: real time (streaming) and static (stored data). In his slide, Hadoop (which processes data in batch mode) is placed on the analytic side. But if you need to analyze a massive amount of data as it comes in real time, you need streaming analysis. Hadoop is not meant for that. That is why we need databases that can handle real-time streaming data, which is in a totally different area from that of Hadoop.
In the picture, blurry brown lines indicate a set of clouds. The components surrounded by the brown lines may be hosted in a cloud.
This is great. Then, what about application areas? Where does each NoSQL technology apply? Scott Jarr, cofounder and chief strategy officer at VoltDB, gave me the following figure.
Actually, he drew this on a piece of paper but he had a published blog. I will cover it in more detail in a future blog. He looked at the five areas of applications: interactive, real-time analytics, record lookup, historical analysis, and exploratory. He then placed each Big Data technology in one of the five areas. This is a pretty good explanation of NoSQL in terms of application areas. In the figure, VoltDB is colored differently from NewSQL, but he classified it in the NewSQL camp.
Applications to Energy (Smart Grid)
The applications areas discussed most throughout the conference were publication, financial, and SNS. A couple of people said that SNS is a driving force for Big Data and NoSQL on the West Coast, but on the East Coast it is primarily financial communities. What about its application to smart grid? In the Soft Grid conference, focus was on metered data, which will be collected, aggregated, and stored in real time but analyzed in no real-time fashion. I heard during the Soft Grid conference that some utilities were using Hadoop to analyze their metered data.
The Northeast blackout of 2003 was caused because timely actions were not taken to isolate the problem area from the rest of the power grid, and faults cascaded to the entire area. The causes of the blackout were studied intensely. But in 2011, it was repeated in the San Diego area. The initial cause may be different from the one in 2003, but the impact cascaded in the same way as in 2003. With the more connected ICT technologies, modern monitoring systems like SCADA, and real-time analytics of power grid health, this could be avoided. The decision to cut off faulty areas from the grid requires real-time action by monitored data coming in in real time because power moves very quickly. This is an application area that is different from the trend analysis done with Hadoop.
Those companies I interviewed told me the application to smart grid may be an interesting idea, but it is still premature, as they do not see the market forming. Finally, I just want to mention that David Brown of EMC, a parent company of VMware, used GemFire to implement data collection and analytics for some unnamed utilities. His case was an exception, and I guess the market is still being formed for the utilities.