One of the interviews I conducted at the recent RSA conference was with Brian Christian, CTO and cofounder of Zettaset. Their PR firm emphasizes the security it provides to Hadoop environments, which is understandable, considering that the RSA conference was all about security.
Before getting to the interview, let’s see how Hadoop is doing. The Hadoop site is a good source of information. It contains information related to the Hadoop project as well. We all know that Hadoop is going strong and getting stronger every day as the savior of Big Data solutions. Alex Handy of SD Times wrote a timely article on the Hadoop platform, Hadoop 1.0 IS All About Hadoop 2.0” . You can read the article, but these are its two major points:
- The version known as 0.22 was relabeled as 1.0 on January 4, 2012.
- Version 0.23 is speculated to be 2.0.
Version 1.0 includes the following changes:
- Major upgrades for HBase
- Performance enhancements
- Bug fixes
HBase is the in-Hadoop SQL database to which the user can import data for analysis by Hadoop.
Version 0.23 is speculated to be 2.0. One of the major tasks left to do after version 1.0 was to make HDFS High Availability (HA), the underlying file system for HBase. Hadoop 2.0 is a complete rewrite of its core, known as MR2 (Hadoop map/reduce). After the rewrite, MR2 will become a general scheduler and resource manager. Another feature of Hadoop 2.0 is higher scalability. The article is a good summary of what’s happening with Hadoop now and in the near future.
Let’s shift gears. Some people wonder if the market can sustain several Hadoop companies, when big shots IBM, Oracle, and Microsoft have entered the market, along with startups like Cloudera, Hortonworks, and MapR,.. If Zettaset is yet another Hadoop company, can they survive? It is a very interesting question, so I arranged an interview with Brian. Initially, I was not clear about what they do and how their solutions relate to Hadoop and other business Hadoop companies like Cloudera, Hortonworks, and MapR. The video interview (given at the end of this blog) runs a little less than three minutes, and in it you can find his answer to those questions.
These are two of the takeaways from the video:
- Zettaset enhances Hadoop environments by wrapping raw Hadoop. Through this wrapper multiple pieces, ZooKeeper, MapReduce, and HDFS, are well orchestrated with Hadoop.
- In conjunction with the first point, security is also provided.
The first point is well depicted in the figure below.
This is done by tapping into the APIs provided by Hadoop without altering the Hadoop source code. As you can see, and as Brian emphasized, they do not want to fork the code. Brian explained that by using the original code without modifications, their product can wrap any Hadoop system, business enhanced or not. So Zettaset can provide solutions to any of the Hadoop products. That means they can wrap the original Hadoop and versions by such providers as Cloudera, Hortonworks, and MapR. This is a very wise way of providing solutions without betting on any particular business solution provider.
I did not ask Brian whether the following case works. Let’s say that one of the business-enhanced versions uses its own API (and let’s call it enhanced API, for the sake of reference) based on the original API. If Zettaset does not have a hook to this enhanced API, it cannot provide its enhanced functions. This needs further investigation.
Here’s the video interview.
Finally, I have to say something about energy efficiency to make this blog relevant to my theme, the intersection between ICT and energy. A vast amount of Big Data is unmanageable without a solution like Hadoop. Zettaset makes it more manageable by providing good management and security by wrapping any Hadoop system, business enhanced or not. I have a wild idea. If Hadoop and other Big Data solutions can identify useful data and derive good information from it, why not identify what is not useful in the huge amount of data and flag it for later deletion, or at least combine, condense, and reduce it into a smaller version? If we know what is not useful to retain for the future, then there is no reason to keep those data, and we can reclaim their storage areas for other use. It is not trivial to classify some data as not useful, because depending upon the point of view, the same data set may or may not be useful. Some cases may be obvious. By removing unnecessary data from storage, we could increase energy efficiency.