Cloudify is a tool to smoothly launch applications in a cloud environment with a recipe that describes everything necessary, including resources and their configurations. GigaSpaces’ new product (at least, I thought it was new) is eXtreme Application Platform (XAP), an accelerator for NoSQL databases. GigaSpaces’ XAP is not a database, analytics tool, or visualization tool. In short, it is an in-memory utility to enable real-time data processing for other NoSQL databases, like Cassandra and MongoDB. SQL or no SQL, the rate and the speed of Big Data have become a problem for a database to process. A simple solution is to put some kind of front end in place to process such high-volume and high-speed data. In-memory data processing is usually much faster than any data storage dealing with disk I/O. Both VoltDB and Couchbase, which I interviewed at the same conference, use their implementation of an in-memory database for this. Other databases may partner with other companies to provide such a technology. Nati referred to Edd Dumbill’s blog, which says that one of the trends in Big Data is streaming data processing. For that, in-memory technology is invaluable.
I thought XAP was a new product that came after Cloudify. However, XAP was developed about 10 years ago, when the Internet bubble was in full bloom. They thought high-speed data processing was necessary to accommodate business-to-business (B2B) interactions with scalability. As we all know, the dot-com era did not last very long, and their prediction did not materialize. Actually, the financial community was an early adopter because of real-time data processing in such things as credit card transaction processing and stock trading. So GigaSpaces decided to develop a product to serve those needs in 2004, and they have kept improving it over the years. The current version of XAP is the ninth edition. As I wrote in a previous blog, the NoSQL domain includes companies that develop databases, utilities, analytics engines, and visualization tools. This classification is shown in Matt Aslett’s blog with leading NoSQL companies. Matt places GigaSpaces in the data/grid cache category.
The following is a summary of my chat with Nati about his solutions and his view of the NoSQL market.
GigaSpaces is headquartered in New York and also has offices in San Jose, CA, London, and Israel. Nati said that Big Data is fueled by different things, depending on the geography. In the US, SNS drives Big Data on the West Coast, while the financial requirements mentioned above drive it on the East Coast. SNS and financial applications are very different, but they both generate a high volume of data at high speed. SNS, especially, generates data in an unformatted way, such as tweets.
Regarding the relationship between XAP and Cloudify, they are currently tightly integrated. Data cluster management is necessary for the management of large data sets. Cloudify needs the same data cluster management for applications. Thus, the two share the same underlying data cluster management platform. After all, applications and data should go hand in hand for provisioning and management. Nati’s blog describes this integration in more detail. In short, XAP accelerates data acquisition and Cloudify manages the cluster.
I was not sure about the relationship between the two products. Nati gave me a little more processing information, as follows:
When streaming processing is required, both XAP and Cloudify are deployed. If streaming is not required, Cloudify alone is appropriate. This is a logical diagram, but in reality those three boxes can run on a single physical server, or two on the same machine, because XAP should work closely with a database. Cloudify is written in Java, and XAP is written in both Java and C++. XAP not only accelerates data acquisition but also provides data processing and guarantees data consistency.
Next I asked him to draw a picture showing where something like GigaSpaces’ XAP resides between NoSQL and NewSQL. Here’s the picture, showing the two domains in an oversimplified manner for ease of understanding. Note that each database, whether NoSQL or NewSQL, is different in its offering and performance. For example, Couchbase claims high processing power, although it is classified as NoSQL database.
OK, then, there seems to exist another category between NoSQL and NewSQL. I asked Nati what this new category is. His answer was that it is a Big Data system or streaming/real-time processing system. Remember Edd Dumbill’s blog. Nati said that streaming processing is currently a niche area but definitely required for application areas that process a high volume of data at high speed or with little tolerance for latency, such as financial transactions like risk analysis.
I asked him about the application of streaming data processing. Some utilities companies process Big Data with Hadoop to analyze meter-read data. Streaming processing may be a niche, but it is becoming necessary to process such things as meter-read data that may come in from millions of power meters in semi- or real time. It would be interesting to combine those data streams with weather data that might also change in real time. For a balancing authority like California ISO, which is tasked with balancing power demand and supply in real time, the real-time data sources vary and can be very large. It is necessary to source a large volume of data to process to get a good picture of the status of the power grid in real time to avoid blackouts. I have yet to see any examples of streaming data processing in the utilities business, but I think such an application area exists.
Nati mentioned that real-time requirements are growing and that Google, which invented the concept of Hadoop, is moving to Percolator, which supports real-time Big Data. Maybe this domain will not remain a niche for long.
The whole GigaSpaces system looks pretty complex, and integration seems to require a lot of hand-holding. Nati said that it normally does, but he makes extra efforts to make it very easy. He continued as follows:
“BigData systems are complex by definition – look at Hadoop, NoSQL, etc. What we do is integrate them in a consistent way and make reduce large part of the operational complexity and development complexity.
If you would compare the amount of effort that is required to build a twitter like real-time analytics with GigaSpaces you’ll see that all you need to write few snippet of code to process your logic, scaling, fail-over, integration with BigData storage, management and monitoring is all curved out from the developers.”
They also provide training. One thing they thought of was an interface with popular NoSQL platforms like Cassandra and MongoDB. GigaSpaces has a semi-official partnership with those database companies. This is intended to exploit the fact that more people have worked with those databases; GigaSpaces can ride on their knowledge to lower the training curve.
Moving forward, I asked Nati to consult his crystal ball as to what will happen to the NoSQL/Big Data market. Will any standard emerge from a standards body or two? He told me that, as in many emerging markets, many of the companies will be consolidated and disappear, except for some like Hadoop, Cassandra, and MongoDB. As for the storage mechanism, one form is good for one thing but not other things. If there is a standard way of accessing data, key-value, tabular, or document-based data will be consolidated, but the forms themselves will survive because one size does not fit all. He also said that SQL by itself is not wrong but its implementation is. It is interesting to compare his remark with Scott Jarr’s, who said the same thing. Nati predicted some sort of standards would emerge by consolidation but not from standards bodies.
After conducting five interviews, I have some idea of what NoSQL is all about. One thing I am certain of is that the utilities business is increasingly dependent on ICT technologies. Without them, smart grid will not be accomplished.