This blog is for ICT people and not for electric power people. Therefore, please bear with a short description to set the stage before I continue.
The power industry is in the midst of a revolution brought on by insufficient power and new regulations, such as the Renewables Portfolio Standard (RPS). Coal-based power plants are being retired, mostly because they are inefficient and make the air unhealthful. Hydropower generation is limited by the lack of suitable locations, and the construction cost of nuclear power plants has skyrocketed. With this going on, new power generation from solar, wind, and other sources is being encouraged. Smart grid is a way to modernize the power grid to maintain the balance of demand and supply with ICT control from many sources.
In maintaining the balance of demand and supply, it is imperative to know how much demand is made at each moment. With the installation of smart meters, such information is becoming available. The meter data are in a specific format and can be considered as structured. But in addition to those data, all sorts of additional data will be collected and analyzed at each moment, such as equipment health on distribution and transmission lines, weather information, and power price changes. Some of the data may not be well-structured (unstructured data).
All in all, electric power utilities will increasingly collect more data to process for reliable operations. The number of data to be collected will increase, and the speed of incoming data is expected to increase further. This is one of the Big Data problems—very high data velocity, large data variance, and high data volume.
If we want to analyze each data coming in very rapidly, we need real-time analytics and real-time NoSQL. In the past, I interviewed a variety of NoSQL database companies. VoltDB presented a classification of NoSQL in one of their blogs. At the recent TieCon 2013, I found a company called Aerospike, which develops and markets real-time NoSQL solutions. I could not catch them during the conference but got to visit them at their headquarters later. They were nice enough to greet me with three executives: Brian Bulkowski, founder and CTO; Srini, V. Srinivasan, founder and VP engineering & ops; and Monica Pal, VP marketing. By the way, the company was named after the aerospike engine.
From left: Monica Pal, Srini Srinivasan, and Brian Bulkowski
Before the meeting, I read their technical white paper (Architecture Overview) to be able to ask intelligent questions. The white paper is well written but a bit too detailed and complex, so I will discuss it at a higher level, with some comments from them. If you think you are technical enough, get the original copy.
The following diagram is a simplified version of the architecture diagram in their white paper.
There are four major layers:
The client layer is a set of APIs to control cluster configurations and transactions. In other words, it is a linkable library and comes in C#, Java, Ruby, PHP, and Python. An application uses the APIs to transparently access (i.e., read/write) databases.
The remaining three components are all written in C for efficiency and linked together to form one executable, which is placed on each clustered node. The following is a short description of each component:
- Distribution layer is responsible for cluster management, transaction management, and data migration.
- Data layer is to guarantee scalability and ACID.
- Storage layer interacts with physical data storage consisting of DRAM and flash/SSD.
Real-time processing and native flash
To process real-time Big Data, we need a fast, reliable, and scalable solution. In general, if a system, which consists of several components, shows a particular characteristic, it is not as the result of one single component. All the components composing the system should be designed and integrated to attain that. In the case of Aerospike database, its clustering system plays a big role in ensuring reliability and scalability with a function that allows backups within the same and other data centers (cross-data center replication or XDR).
According to their website:
Aerospike’s hybrid memory (DRAM and native flash) architecture scales up and out, consistently processing over 500k transactions per second per node with sub-millisecond latency. With automatic fail-over, replication, and cross data center synchronization, the Aerospike database reliably stores billions of objects and terabytes of data—while providing 100% uptime and 17x better TCO than other NoSQL databases.
Although their clustering design contributes to this result, they have an innovative technology that makes the product economical and green (saves energy and resources) as well. In making the claim, they mentioned their patent-pending native flash technology.
According to Monica Pal:
“Native flash refers to our hybrid-memory storage system—our proprietary file system that turns flash into memory instead of disk.”
This is a good compromise between superfast DRAM and slower but more economical flash memory.
Brian explained some of the background of why they implemented the hybrid memory structure here. Oversimplified takeaways from his article include:
- DRAM is faster than flash/SSD, but on many occasions performance is network bound and having DRAM instead of flash/SSD might not improve performance very much, depending on the case.
- DRAM is more expensive than flash/SSD memory, and it would break your bank if you kept adding it.
- The rotational disk is far cheaper, but it is bulky and occupies space. This addresses more energy consumption at your data centers. Taking space alone requires more cooling. In addition, the price performance is not as good as for flash.
- By optimizing the combination of DRAM and flash to form storage, the best solution would be attained. This is the idea behind native flash.
I usually do not take a vendor’s claim as is, but Aerospike participated in an independent NoSQL benchmark test by Thumbtack Technology. The result is published in Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs.
- The Thumbtack benchmark used YCSB with modifications to simulate a high-throughput environment.
- Aerospike achieved nearly 200,000 transactions per second (TPS) in balanced read/write tests and more than 300,000 TPS in read-heavy tests. (That’s less than 1 millisecond per transaction, on average.)
Their product comes with two versions: community and commercial. The community version is available from here and can be used for commercial use forever – only restriction is 2 nodes and 200GB data. The difference between the community and commercial versions is shown here.
If we can replace some storage with flash/SSD memory, which is small and consumes less energy without suffering from scalability and speed, it would be a great alternative. Besides, flash/SSD memory is considered less expensive than DRAM. Is it really? In theory, it is. We may be able to compute it from Thumbtack’s benchmark test because we have hardware and other pertinent information. Of course, a real energy consumption benchmark would be more convincing. The smart grid effort is well under way. Power grid stability would be guaranteed as we collect more data in real time. A new technology called synchrophasor or phasor measurement unit (PMU) allows more precise knowledge of the health of transmission lines and collects more data points in real time. PMU is replacing monitoring by the SCADA system. Other new technologies are being applied to the power grid, and we can easily expect the volume of data collected to skyrocket. At a recent conference, one speaker mentioned that a power utility company needs to process 100K events per second. Beyond smart grid, smart city, which considers other systems such as traffic and water, would increase the volume of data astronomically. Very fast NoSQL databases would be in high demand.
It was not my plan to spend a lot of time writing this blog. But the Aerospike product contains a lot of interesting technologies. I do not know about you, but I cannot cover many interesting technologies in depth. As I go deeper into what they offer, I hit a wall. I mean that I need more time to dig deeper into the technologies relevant to their product. Unfortunately, I do not have time for that, although I really want to do it. Remember that in my previous blog I said I would dig more into data science? Time is short, but there are too many innovative technologies to investigate. Decisions, decisions!
Disclaimer: I have not examined Aerospike’s products in detail and am not qualified to comment on their claim. However, from my technical background and the information available to me, I find it very convincing. You must make your own judgment.