<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tek-Tips Whitepaper Library &#187; Database</title>
	<atom:link href="http://tek-tips.nethawk.net/category/editorial/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://tek-tips.nethawk.net</link>
	<description></description>
	<lastBuildDate>Wed, 22 May 2013 16:20:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>How to Apply ICT to the Power Grid: OSIsoft’s Way – Part 2</title>
		<link>http://tek-tips.nethawk.net/how-to-apply-ict-to-the-power-grid-osisoft%e2%80%99s-way-%e2%80%93-part-2/</link>
		<comments>http://tek-tips.nethawk.net/how-to-apply-ict-to-the-power-grid-osisoft%e2%80%99s-way-%e2%80%93-part-2/#comments</comments>
		<pubDate>Wed, 15 May 2013 13:41:27 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Center]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Mobile and Wireless]]></category>
		<category><![CDATA[Network Securities]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunication]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Dave Roberts]]></category>
		<category><![CDATA[OSIsoft]]></category>
		<category><![CDATA[power]]></category>
		<category><![CDATA[Prediction]]></category>
		<category><![CDATA[smart city]]></category>
		<category><![CDATA[Smart Grid]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=8536</guid>
		<description><![CDATA[This is a continuation from Part 1. Interfaces required to multiple domains I think their decision to keep themselves a software infrastructure company is smart. In this way, they can apply their systems to many market segments where operations are involved. When operations are performed, some kinds of data are generated and often times those [...]]]></description>
			<content:encoded><![CDATA[<p>This is a continuation from <a href="http://altaterra.site-ym.com/blogpost/288668/164209/How-to-Apply-ICT-to-the-Power-Grid-OSIsoft-s-Way--Part-1">Part 1</a>.</p>
<p><strong>Interfaces required to multiple domains</strong></p>
<p>I think their decision to keep themselves a software infrastructure company is smart. In this way, they can apply their systems to <a href="http://www.osisoft.com/industry/overview.aspx">many market segments</a> where operations are involved. When operations are performed, some kinds of data are generated and often times those data should be collected, stored, and analyzed to tune and improve operations and business processes. In order to dive into new domains, they need to keep adding new interfaces as well as adding and revising in areas they already cover. Dave Roberts told me that they now have close to 500 interfaces.</p>
<p>Coming from the IT segment, I see people tending to converge to a handful of well-defined standards and, therefore, interfaces. When I first put my foot into the data center market, I was very, very surprised to find out that there were many interfaces on the facilities side. Although <a href="http://www.bacnet.org/">BACnet</a> is becoming a major force in the data center facilities protocol of choice, there are still several other protocols, such as <a href="http://en.wikipedia.org/wiki/Modbus">Modbus</a> and <a href="http://en.wikipedia.org/wiki/LonWorks">LonWorks</a>, being used. An IT guy like me tends to think we can force facilities to adopt a single standard to consolidate all the protocols into one, which is IP. I now know it does not work that way. I got involved in <a href="http://www.nist.gov/smartgrid/priority-actions.cfm">NIST&#8217;s</a> <a href="http://sgip.org/">Smart Grid Intero</a>perability Panel, which was organized to come up with a set of standards to allow smart grid to function without conflicting technologies and protocols. The power industry has been around longer than IT, and there are many standards by <a href="http://www.ieee.org/index.html">IEEE</a>, <a href="http://www.iec.ch/">IEC</a>, and others. The power industry has been conducting business to keep the lights on for more than 100 years, and they will not listen to IT about consolidating everything to IT technologies and protocols, for sure.</p>
<p><strong>How to translate domain specific requirements for software developers</strong></p>
<p>OSIsoft maintains that their core PI system is generic and does not change when they apply PI to different vertical markets. When they pick a new domain, they add new interfaces specifically required for that domain. So every time they step into a new domain, they need to worry about yet more interfaces to maintain. This seems daunting, but it is the only practical way to have a generic system to apply to many areas, such as the power industry, oil and gas, and building management segments.</p>
<p>For each vertical domain there is a dedicatedindustry management team that includes experts in that field who can communicate natively with customers. The experts get agreements on requirements, then translate those requirements to a specification for software development teams and partners/ecos to work on.</p>
<p><strong>How to enter a conservative industry like the power industry</strong></p>
<p>IT&#8217;s change of pace is very fast. New technologies come and go quickly, sometimes within months, if not days. In contrast, utility companies are very conservative and do not replace their technologies and equipment for many years until new technologies or equipment are proven to work solidly. I was curious to find out how a software company like OSIsoft could penetrate into the conservative power industry. In the 1990s, OSIsoft partnered with <a href="http://www.westinghouse.com/">Westinghouse</a> and also with <a href="http://www.abb.com/">ABB</a>. Through their introductions to utilities, they started to work with utility players. They expanded their market presence in the utilities market. Although there are a lot of similarities, each utility has specific needs, which triggers customization. But OSIsoft does not provide customization services. Customization is done by utilities themselves or system integrators. Nearly all—97%—of their revenue comes from software maintenance; the remaining 3% comes from basic services such as installation. So a highly configurable nature is important for their product.</p>
<p><strong>Sharing data among multiple entities</strong></p>
<p>In general, if two entities work together, it would be most beneficial to share data among the two. For example, let me refer to the power grid in California. <a href="http://www.caiso.com/Pages/default.aspx">California ISO</a> (CAISO), which reliably balances power supply and demand on the transmission, does not maintain the transmission lines. The lines are maintained by <a href="http://www.pge.com/">PG&amp;E</a>, a local utility in my region that also is responsible for the distribution grid. Power imbalances can be caused by operational or equipment problems. Therefore, it is very useful if CAISO shares data with PG&amp;E so that they can work together to solve the problem. For this, OSIsoft has released a new feature called <a href="http://www.osisoft.com/Templates/item-abstract.aspx?id=8174">PI Cloud Connect</a>, which allows highly granular data to be shared with specific accessibility control in a cloud setting. In this way, any number of organizations can share time-series data with a specific access privilege. Yes, this is a good application of ICT.</p>
<p><strong>Analytics</strong></p>
<p>Once data are captured and stored, they are analyzed to derive useful information to improve operations and business processes. Analytics can be done at many levels. They can be as simple as out-of-bounds values analysis all the way up to prediction. Here OSIsoft does not do its own analytics packages but makes sure to plug in others&#8217; packages seamlessly to the PI system. I am currently looking into analytics more in detail. Because analytics is a very broad term and it contains so many angles, most presentations or white papers on products do not mention it in detail. That is frustrating, to say the least.</p>
<p>What is an example of analytics in the utilities business?</p>
<p><em>Analytics example 1</em>: equipment preventive maintenance</p>
<p>Do you see boxes of different colors and shapes on utility poles around you? One of those boxes is called a transformer and is used to step down high voltage to lower voltage before power gets to your home. Most transformers are based on the <a href="https://en.wikipedia.org/wiki/Electromagnetism">electromagnetism</a> discipline and degrade physically as time goes by. If a transformer malfunctions or fails, power to your home will be interrupted. It would be nice to know when to repair or replace it before it fails. One of the analytics packages can monitor its health, bounce it with the historical trend, and provide an early warning.</p>
<p><em>Analytics example 2</em>: wind power generation</p>
<p>Another example is in wind power generation. Wind is hard to predict. It is blowing one moment but not the next. It is vital to balance the demand and supply of power every second. If we cannot predict power generated by wind, it makes it more difficult to balance power. So it is very important to predict when wind blows and when it stops. Predictive analytics is used widely in weather forecasting, and wind prediction is part of it. First, a prediction model is developed from the historical data, and the model is fine-tuned and modified as more data are collected.</p>
<p><em>Analytics example 3</em>: smart charging for EVs</p>
<p>Currently, in California, power demand increases as the day goes on and hits a peak in the early afternoon. It goes down to its lowest point during the night. An electric vehicle (EV) like the Nissan Leaf or Chevy Volt is known to draw about the same amount of power as a typical household. If they are charged when power demand is at peak, we run out of power to satisfy demand. But during the night, we usually have plenty of power available, and it is suitable to charge EVs at night at home. This is what a typical EV owner does now. As more public charging stations pop up, and faster yet power-hungry new charging technologies proliferate, charging may be done during peak time. That would disturb the power balance and lead to outages. For this reason, smart charging needs to be developed and deployed. The result of this type of analytics would dynamically allow charging to start when supply satisfies demand.</p>
<p>Different utilities could use an analytics package developed by one utility, but OSIsoft does not share particular users&#8217; analytics algorithm with others. OSIsoft has its <a href="http://community.osisoft.com/">users communities</a>, and those who belong to them might share such an algorithm via community. The <a href="http://www.osisoft.com/tdusersgroupmeeting/">T&amp;D User Group</a> community exists for 20 years, and they tend to share information when there is no competition among them.</p>
<p><em>Analytics example 4</em>: more renewable energy sources for power generation in California</p>
<p>California has adopted a <a href="http://www.cpuc.ca.gov/PUC/energy/Renewables/index.htm">renewables portofolio</a> system, known as RPS. This specifies the minimum percentage of renewable energy sources, like solar and wind, in power generation. California plans to attain 33% of all the power from renewable energy sources by 2020. Although not all the renewable energy sources are highly volatile, like wind power, a lot of unknowns will be thrown into the power grid. Constant power-supply predictions based on ever-changing weather (the wind may or may not blow at any given minute, and solar power goes down when clouds set in) will be vital to keep the power grid stable all the time.</p>
<p>Applying PI to more demanding domains</p>
<p>Smart grid is to make the power grid smarter. Our physical infrastructures consist of more than the just the power grid; we need, for example, gas, water, waste, transportation, government, street lights and traffic systems. Dave is working on the next topic beyond the power grid, which is the <a href="http://en.wikipedia.org/wiki/Smart_city">smart city</a>. According to Dave, a smart city is defined differently by different people. But currently, US cities like Austin, Seattle, New York, and Chicago have their smart city projects. OSIsoft is involved in some of them, and a public announcement is coming shortly.</p>
<p>Collecting, aggregating, storing, and linking all sorts of data from its different sources would provide tremendous intelligence to a city. A utility at the conference reported that they collect 100,000 data per second. If we implement a system for a smart city, the number of data points would explode by the order of 2 to 3 magnitudes. That means millions of data per second would bombard the PI system. Even though the PI system is created to cope with a large amount of data of many kinds, at some point, they may have to alter their architecture and technologies to process such a massive amount of data. That makes me interested in talking to their technology visionary. Stay tuned for that in a coming blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/how-to-apply-ict-to-the-power-grid-osisoft%e2%80%99s-way-%e2%80%93-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Game Changer? Beyond Realizing Hybrid Clouds—Part 2</title>
		<link>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-2/</link>
		<comments>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-2/#comments</comments>
		<pubDate>Fri, 08 Mar 2013 20:53:12 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Center]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Editorial]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[Cloudvelocity]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[Hybrid Cloud]]></category>
		<category><![CDATA[Private Cloud]]></category>
		<category><![CDATA[Public Cloud]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=7968</guid>
		<description><![CDATA[This continues the discussion of CloudVelocity’s hybrid cloud technology. In this blog, I would like to talk about what’s under the hood. Some technical details As a former technologist, I wanted to open the hood and find out more about the underlying technologies. For this, Anand Iyengar, CloudVelocity’s founder and CTO, gave me a chalk [...]]]></description>
			<content:encoded><![CDATA[<p>This continues the discussion of <a href="http://www.cloudvelocity.com/">CloudVelocity’</a>s hybrid cloud technology. In this blog, I would like to talk about what’s under the hood.</p>
<h4>Some technical details</h4>
<p>As a former technologist, I wanted to open the hood and find out more about the underlying technologies. For this, Anand Iyengar, CloudVelocity’s founder and CTO, gave me a chalk talk.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/cloudveloicty-anand.jpg" alt="" /></p>
<p>Anand Iyengar</p>
<p>Because this is not a white paper detailing the technology, I only describe it at my layman’s level. However, it is such an intriguing technology that I’m accepting Anand’s offer for further discussion and will write more about it in the future.</p>
<p>Anand elaborated on the details, but I made a simpler diagram to fit the space. It is not that much different from the picture above.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/cloudvelocity-fig.gif" alt="" /></p>
<p>Virtual machines (VMs) move between a typical enterprise private cloud (mostly VMware-based) and a public cloud (typically Amazon AWS). (Source: CloudVelocity)</p>
<p><strong>Setup:</strong></p>
<p>Let’s take a quick look at the architecture:</p>
<p>Private cloud</p>
<ul>
<li>We first look at your own data center or colocation facility (private cloud). In the modern software application system, an application does not run on a single server. Instead, the running of an application spans multiple physical and virtual machines. So we call it a multisystem application. The configuration may differ according to usage and design. Typically, it consists of load balancers, web servers, application servers, and sometimes a cluster of other servers.</li>
<li>This is illustrated in the figure above. To save space, I drew only two machines, S1 and S2. The multisystem application typically uses a database, file systems mounted from a closed-box NFS server system (NFS1), and services from an LDAP server (LDAP). Everything in the public cloud is a copy of what is in the private cloud, including NFS1. Note that NFS provides files locally but not over the cloud boundary. Moreover, in the private cloud there is a server, such as an LDAP, that one may not want copied to the public cloud but kept in the private cloud for security reasons.</li>
<li>There are virtual appliances (CloudVelocity Nexus Site Manager for the private cloud and CloudVelocity Cloud Manager for the public cloud) that together keep the cloud site images synchronized with the most recent changes to systems in the private cloud. CloudVelocity uses the term appliance to emphasize its dedicated function. CloudVelocity Nexus may run on a physical server, while CloudVelocity Cloud Manager runs as a virtual machine.</li>
<li>Let&#8217;s further assume that S1 (in the VMDK file format) is virtualized, but neither S2 nor DB1 is virtualized.</li>
</ul>
<p>Public cloud</p>
<ul>
<li>Everything in the public cloud is a mirror image of what is in the private cloud. The public cloud is populated by copying what is in the private cloud. An initial copy is made for each system, and updates are sent afterwards.</li>
</ul>
<p style="padding-left: 60px"> A. System S1, which is virtualized needs to be copied to a pubic cloud. S1 is copied   via the link to the public cloud, unless there is a copy left over from a previous need, in which case only the differences are copied. It is converted to AMI automatically. In the case of S2, it must be copied via the link to the public cloud. Like S1, if there is not a copy left over from a previous need, it gets virtualized to run on an AMI file format.</p>
<p style="padding-left: 60px">B. System DB1 and NFS1, which are physical servers, go through the same process. They also are automatically virtualized to run on AWS/AMI.</p>
<ul>
<li>The two clouds are linked by the Internet or a dedicated connection via SSL.</li>
<li>When any of the systems are no longer necessary, they can be disabled and deleted, or retained for future use. The copy may be retained to minimize copying time in the future.</li>
</ul>
<p>Some high-level description continues regarding how those components work together. The actual workings are much more complex, but I have simplified them for this presentation.</p>
<p><strong>Operation:</strong></p>
<ul>
<li>CloudVelocity Nexus inventories all the pertinent information regarding computing power in the private cloud, including applications and supporting servers, such as file systems and databases. The configuration information is stored in a proprietary file format.</li>
<li>Inventory information is passed to the CloudVelocity Cloud Manager in the public cloud. This appliance is virtualized to run on AWS (in the AMI file format) all the time. Storage and computing time for this appliance are charged per AWS pricing. The size of the appliance is negligible at several hundred kilobytes, and it does not cost much. Once Cloud Manager receives the configuration information, storage volumes for each component get allocated for each system and populated, without running it. This reduces activation time for the public cloud counterparts. EC2 charges are heavier for computing than for storage. The design is a good compromise for reducing copying time and saving on computing charges on EC2.</li>
<li>Starting the systems in the public cloud typically takes three to five minutes, which is the time required to boot up a VM in the AWS cloud. They are started in parallel.</li>
<li>The systems may be disabled when not needed in the public cloud. The user may expect another need for the systems sometime soon and keep a copy around, or delete it to save the storage charge by the AWS system.</li>
</ul>
<p><strong>Application areas:</strong></p>
<ul>
<ol>
<ol>
<li>Cloud fail-over: If the private cloud goes down for any reason but the operation cannot be halted, a full, earlier copy of the application systems may be started in the public cloud to take over the operation. This is called cloud fail-over and can be used for disaster recovery and for implementing features like follow-the-sun and follow-the-moon.</li>
<li>Development and testing sandboxes: More than one full copy of the application can be started simultaneously in the public cloud, while the application is still running in the private cloud. These copies are fully sandboxed and can be used for development or testing.</li>
<li>Complete move: For datacenter space constraints and other reasons, the systems in the private cloud may be cloned to the public cloud and those in the private cloud, disabled.</li>
<li>Cloudbursting: This allows extending computing power in the private cloud by enabling and cloning computing power in the public cloud, if a load surge takes place. This can be accomplished without losing data integrity in the private cloud, because two appliances can tunnel update requests back to the local site. Any changes made on the public cloud are constantly sent back to the private cloud for data consolidation, so when the load surge subsides and the copies in the public cloud are taken down, data integrity is maintained.</li>
</ol>
</ol>
</ul>
<p>Patent-pending technology</p>
<p>Anand said that two technologies in <a href="http://www.cloudvelocity.com/how-it-works/">One Hybrid Cloud Platform (OHCP)</a> are unique, and CloudVelocity is applying for a patent for each.</p>
<p>The first has to do with synchronizing two data stores via two appliances that contain the inventory of computing equipment in both clouds. I will not go into detail, but according to Anand, replicating and maintaining synchronization between the two requires some work. During switchover time between the primary and the secondary copy of a VM by vMotion, pages dirtied on the primary copy are constantly sent to the secondary copy for synchronization. This requires fast (about 5 to 10 ms) communication between the primary and the secondary, but it allows a game running on one server to run continuously on another server after the move. The OHCP sends all the changes once in the form of a file and that makes it possible to send over a slower connection like the Internet with encryption (SSL). As for the moving of a running game, OHCP does not support such a feature.</p>
<p>The second is concerned with letting the duplicated copies of VMs in the public cloud have access over the connection to databases like LDAP in the private cloud. As noted before, because of security concerns, some servers and databases may not be duplicated in the public cloud. So VMs in the public cloud need to have access to them in the private cloud.</p>
<p><a href="http://www.cloudvelocity.com/how-it-works/">OHCP</a> vs. <a href="http://www.vmware.com/files/pdf/VMware-VMotion-DS-EN.pdf">vMotion</a></p>
<p>After discussion with Anand, I came to understand that vMotion and OHCP address different problems, but may overlap in some functionality. Both technologies move systems in execution from one cloud to another. But there is more to it. I summarized the differences in the following table.</p>
<table width="577" border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr valign="TOP">
<td width="183"></td>
<td width="184">
<p align="CENTER">OHCP</p>
</td>
<td width="184">
<p align="CENTER">vMotion</p>
</td>
</tr>
<tr valign="TOP">
<td width="183">
<p align="LEFT">Cloud requirements</p>
</td>
<td width="184">
<p align="LEFT">Works between heterogeneous physical or virtual systems and clouds</p>
</td>
<td width="184">
<p align="LEFT">Both clouds need to run with VMware</p>
</td>
</tr>
<tr valign="TOP">
<td width="183">
<p align="LEFT">Unit of synchronization</p>
</td>
<td width="184">
<p align="LEFT">File</p>
</td>
<td width="184">
<p align="LEFT">Main memory page and block storage</p>
</td>
</tr>
<tr valign="TOP">
<td width="183">
<p align="LEFT">Bring-up time</p>
</td>
<td width="184">
<p align="LEFT">3–5 minutes (VM booting time on AWS)</p>
</td>
<td width="184">
<p align="LEFT">A few seconds</p>
</td>
</tr>
<tr valign="TOP">
<td width="183">
<p align="LEFT">Connection requirements</p>
</td>
<td width="184">
<p align="LEFT">Not particularly (can be Internet) with SSL</p>
</td>
<td width="184">
<p align="LEFT">Latency &lt; 5 ms, or distance &lt; 200 km; fast, dedicated connection preferred</p>
</td>
</tr>
<tr valign="TOP">
<td width="183">
<p align="LEFT">Application areas</p>
</td>
<td width="184">
<p align="LEFT">Cloud fail-over, development/testing, migration, cloudbursting</p>
</td>
<td width="184">
<p align="LEFT">Applications keen on quick switchover; within the same data center or relatively short distance</p>
</td>
</tr>
</tbody>
</table>
<p>Looking at the table above, it appears that the two technologies are not competing but can be complementary to each other. I will dig into them more in my future blogs.</p>
<p>By the way, I can <a href="http://www.cloudvelocity.com/products/">try out their system</a> free of charge. But wait! I am not ready. I do not have a reasonable-size private cloud myself, much less use AWS. I probably need to consult with some of my friends who are involved in <a href="http://www.svcloudcenter.com/">Silicon Valley Cloud Center</a>.</p>
<p>(Continued to part 3, which will discuss energy efficiency by cloud computing and what it means to have a hybrid cloud.)</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Game Changer? Beyond Realizing Hybrid Clouds—Part 1</title>
		<link>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-1/</link>
		<comments>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-1/#comments</comments>
		<pubDate>Sun, 24 Feb 2013 23:39:01 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Data Center]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Editorial]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[Cloudvelocity]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[Hybrid Cloud]]></category>
		<category><![CDATA[Private Cloud]]></category>
		<category><![CDATA[Public Cloud]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=7781</guid>
		<description><![CDATA[When cloud computing was first introduced, I did not expect that it would develop to such a degree that the IT world would be greatly changed. First public cloud and then private cloud were introduced. Then hybrid cloud became the center of discussion. Some people project 2013 will be the year of the cloud, and [...]]]></description>
			<content:encoded><![CDATA[<p>When cloud computing was first introduced, I did not expect that it would develop to such a degree that the IT world would be greatly changed. First <a href="http://en.wikipedia.org/wiki/Public_cloud">public cloud</a> and then <a href="http://en.wikipedia.org/wiki/Private_cloud#Private_cloud">private cloud</a> were introduced. Then <a href="http://en.wikipedia.org/wiki/Hybrid_cloud#Hybrid_cloud">hybrid cloud</a> became the center of discussion.</p>
<p><a href="http://tek-tips.nethawk.net/wp-content/uploads/2013/02/cloudvelocity.gif"><img src="http://tek-tips.nethawk.net/wp-content/uploads/2013/02/cloudvelocity.gif" alt="" title="cloudvelocity" width="356" height="241" class="alignnone size-full wp-image-7791" /></a></p>
<p>Some people project 2013 will be the year of the cloud, and hybrid clouds are talked of as one of the trends for the year to come. See h<a href="http://www.getcloudservices.com/blog/2013-cloud-computing-trends">ere</a>, <a href="http://www.networkworld.com/supp/2012/enterprise6/120312-ecs-hybrid-cloud-264443.html">here</a>, <a href="http://gregness.wordpress.com/2012/12/20/top-five-archimedius-cloud-predictions-for-2013/">here</a>, and many other places.</p>
<p>As I said before, much of hybrid cloud is just talk and not reality, and there have been several showstoppers before now.</p>
<p>Some of the many factors making it hard to implement hybrid clouds are mainly technical:</p>
<p>Technical problems</p>
<p>Virtual machine (VM) file format</p>
<ol>
<li>Public cloud: <a href="http://aws.amazon.com/">Amazon Web Services</a> was the first to implement a public cloud, and AWS is now the de facto standard for public cloud. It uses its own proprietary file format (<a href="https://aws.amazon.com/amis">Amazon Machine Image, a.k.a AMI</a>) running virtual machines on the <a href="http://www.xen.org/">Xen</a> hypervisor. Their file format is not the same as the original Xen VM format. So even if you are running Xen hypervisor for your cloud, you cannot enjoy interoperability with AWS without converting your VM&#8217;s file format. For example, Citrix virtualization environment is based on Xen, but its file format is <a href="http://en.wikipedia.org/wiki/VHD_%28file_format%29">virtual hard disk (VHD)</a>, which is also the file format for Microsoft&#8217;s virtual machine.</li>
<li>Private cloud: In the enterprise market (private cloud), <a href="http://en.wikipedia.org/wiki/VMDK">VMw</a>are&#8217;s VM file format (VMDK) is the de facto standard.</li>
<li>Hybrid cloud is an attempt to use both private and public clouds to process IT demands by optimizing suitable in-house and outsourced IT infrastructures as needed. So when we want to move VMs back and forth between public and private clouds, we need translations each time we move them across the cloud boundary. It may not be very hard to do so, because there are some translation tools readily available from vendors like <a href="http://aws.amazon.com/ec2/vmimport/">Amazon</a> and VMware (<a href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1028042">vmkfstools</a>). It may be straightforward to move VMs that are not in execution, but VMs in execution are generally hard to move with their execution state intact. See the next.</li>
</ol>
<p>Physical movement of VMs</p>
<ul>
<li>If we want to exploit public and private clouds for an application in execution, that execution instance may be transported between two or more clouds to find the most suitable execution environment. One big issue is the distance between clouds. VMware&#8217;s <a href="http://www.vmware.com/files/pdf/VMware-VMotion-DS-EN.pdf">vMotion</a> allows you to transport your VM up to something like 100 km (80 miles) but no farther. With this physical restriction, what you can do with hybrid cloud may be limited by the distance between clouds.</li>
</ul>
<p>Various support environment</p>
<ul>
<li>Cloud is not just virtualization but needs a comprehensive environment, such as management and support, including tools and security considerations. Each cloud tends to come with its own environment and idiosyncrasies, so what you can do easily in one cloud may not be as easy in another cloud. This would make managing a hybrid cloud cumbersome.</li>
</ul>
<p>To date, most discussions on hybrid have been at a very abstract level and not at all concrete. People have talked about what we could do with hybrid cloud without referring to its concrete implementation. Recently, I came across yet another brand-new cloud company that claims to have solved the aforementioned problems. Greg Ness recently sent me email with a <a href="http://www.businesswire.com/news/home/20121212005432/en/CloudVelocity-Emerges-Stealth-Mode-Announces-5-Million">press release</a> and wanted to show what CloudVelocity, his new company, is doing in the area of hybrid cloud.</p>
<p>I am by no means an expert in hybrid cloud computing or any kind of cloud computing, for that matter, but let me try to review how hybrid computing is implemented with their technologies. To support hybrid cloud, VMs need to move back and forth between private and public clouds. How can we implement such a move? Because an execution space is not shared between a public and a private cloud, we cannot literally move a VM across the clouds. What we do is to make a copy of a VM executing at one cloud and transport its execution status to a cloned VM at another cloud. Then we can disable the original VM and enable the cloned one. If a VM is not in execution, it is not that hard. But if it is in execution, it is much harder.</p>
<p>If both private and public clouds are implemented with the same technologies and the distance is less than, say, 100 km, the same VM could be transported with a utility like vMotion. But in most cases, two cloud environments are not the same (see the technical problems described above), and the distance could be greater. Also, you can move only virtualized applications but not traditionally maintained applications, because you cannot assume all the applications have been virtualized into a VM format.</p>
<p>We need to have carbon copies of VMs and non-VM versions of applications (that need to be virtualized) on the other side. That means you need to have carbon copies of your applications running on a public cloud. This sounds like a <a href="http://en.wikipedia.org/wiki/Backup_site">disaster recovery (DR) system</a>.</p>
<p>Disaster recovery/fail-over system</p>
<p>In such a system, you duplicate the applications that are running at the primary location and operate them with options at the secondary location. These options include active-active and active-passive configurations. Active-active means that the machines (and thus applications) are live at both the primary and the secondary locations at the same time, with data being copied from the primary to the secondary sites. In this scenario, when the primary location cannot operate any longer for any reason, the secondary location can take over seamlessly. The active-passive configuration may not guarantee complete synchronization, because the passive one in the secondary location does not run until the primary location can no longer support applications.</p>
<p>In any event, if we duplicate the whole thing for the secondary site, as in the case of DR in an active-active fashion, the duplicated copies are always in the secondary site with dedicated servers. This situation is the farthest from cloud computing in spirit, especially for public clouds.</p>
<p>What we need is a solution like this:</p>
<ul>
<li>Copies on the other side made only when needed (on-demand).</li>
<li>Noninteroperability problems overcome:</li>
<li>Resolve VM file format and other incompatibilities among major cloud systems, such as AWS, Rackspace, Microsoft, and OpenShift.</li>
<li>Handle physical vs. virtual applications in an IaaS cloud environment.</li>
</ul>
<p>Now back to CloudVelocity. I visited Greg Ness and Rajeev Chawla, CEO, at their headquarters in Santa Clara. They claim to have implemented a solution to solve the problems discussed above.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/cloudvelocity-1-2.jpg" alt="" /></p>
<p>From left: Rajeev Chawla (CEO) and Greg Ness (VP Marketing). See <a href="http://www.cloudvelocity.com/about/">here</a> for their bios.</p>
<p>They have developed a comprehensive system for implementing hybrid cloud that they call <a href="http://www.cloudvelocity.com/how-it-works/">One Hybrid Cloud Platform (OHCP)</a>, which is depicted in the following picture. Applications move across the cloud boundary in five steps:</p>
<ol>
<li>Host discovery—Inventory your private cloud (data center), which consists of all the pertinent IT hardware and software.</li>
<li>Blueprinting—Create a database of how the discovered components are put together.</li>
<li>Cloud provisioning—Duplicate and create VMs on the target cloud (translating VMs and virtualizing physical applications if necessary).</li>
<li>Synchronization—Synchronize VMs between the two clouds.</li>
<li>Service initiation—Let the duplicated VMs take over and disable the original VMs.</li>
</ol>
<p>&nbsp;</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/cloudvelocity-1-3.jpg" alt="" /></p>
<p>CloudVelocity&#8217;s comprehensive One Hybrid Cloud Platform.</p>
<p>This sounds easy. How do they do this? That will be covered in Part 2.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/game-changer-beyond-realizing-hybrid-clouds%e2%80%94part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hardware, Software, what about Valueware?</title>
		<link>http://tek-tips.nethawk.net/hardware-software-what-about-valueware/</link>
		<comments>http://tek-tips.nethawk.net/hardware-software-what-about-valueware/#comments</comments>
		<pubDate>Wed, 02 Jan 2013 19:41:22 +0000</pubDate>
		<dc:creator>Greg Schulz</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Editorial]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[featured]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=7357</guid>
		<description><![CDATA[I am surprised nobody has figured out how to use the term valueware to describe their hardware, software or services solutions, particular around cloud, big data, little data, converged solution stacks or bundles, virtualization and related themes. Note that I’m referring to IT hardware and not what you would usually find at a TrueValue hardware store (disclosure, [...]]]></description>
			<content:encoded><![CDATA[<p>I am surprised nobody has figured out how to use the term <a href="http://valueware.us/" target="_blank">valueware</a> to describe their hardware, software or services solutions, particular around <a href="http://storageioblog.com/?p=3476" target="_blank">cloud</a>, <a href="http://storageioblog.com/?p=3756" target="_blank">big data, little data</a>, <a href="http://storageioblog.com/?p=2156" target="_blank">converged solution</a> stacks or bundles, virtualization and related themes.</p>
<div id="attachment_7358" class="wp-caption alignnone" style="width: 310px"><a href="http://tek-tips.nethawk.net/wp-content/uploads/2013/01/SIO_BuildingBlocks.gif"><img class="size-medium wp-image-7358" title="SIO_BuildingBlocks" src="http://tek-tips.nethawk.net/wp-content/uploads/2013/01/SIO_BuildingBlocks-300x172.gif" alt="" width="300" height="172" /></a><p class="wp-caption-text">Cloud and virtualization building blocks transformed into Valueware</p></div>
<p>Note that I’m referring to IT hardware and not what you would usually find at a TrueValue hardware store (disclosure, I like to shop there for things to innovate with and address the non IT to do project list).</p>
<p>Instead of value add software or what might otherwise be called an operating system (OS), or middleware, glue, hypervisor, shims or agents, I wonder who will be first to use valueware? Or who will be the first to say they were the first to articulate the value of their industry unique and revolutionary solution using valueware?</p>
<p><a href="http://storageio.com/book3.html" target="_blank"><img src="http://storageio.com/images/SIO_StackBasic1.gif" alt="Cloud and convergence stack image from Cloud and Virtual Data Storage Networking Book" width="465" height="240" border="0" /></a></p>
<p>For those not familiar, converged solution stack bundles combine server, storage and networking hardware along with management software and other tools in a prepackaged solution from the same or multiple vendors. Examples include <a href="http://www.dell.com/content/topics/topic.aspx/global/products/landing/en/virtual-integrated-system?c=us&amp;l=en" target="_blank">Dell VIS</a> (not to be confused with their reference architectures or <a href="http://mymemory.translated.net/t/Dutch/English/vis" target="_blank">fish in Dutch</a>), <a href="http://www.vce.com/" target="_blank">VCE or EMC vBlocks</a>, <a href="http://storageioblog.com/?p=2896" target="_blank">IBM Puresystems</a>, <a href="http://www.netapp.com/us/solutions/cloud/flexpod/" target="_blank">NetApp FlexPods</a> and <a href="http://storageioblog.com/?p=3860" target="_blank">Oracle Exaboxes</a> among others.</p>
<p><a href="http://storageio.com/book3.html" target="_blank"><img src="http://storageio.com/images/SIO_StackBasic2.gif" alt="Converged solution or cloud bundle image from Cloud and Virtual Data Storage Networking Book" width="465" height="240" border="0" /></a></p>
<p>Why is it that the IT or ICT (for my European friends) industries are not using <a href="http://valueware.us/" target="_blank">valueware</a>?</p>
<p>Is Valueware not being used because it has not been brought to their attention yet or part of anybody’s <a href="http://storageioblog.com/?p=1850" target="_blank">buzzword bingo</a> list or read about in an industry trade rag (publication) or blog (other <a href="http://storageioblog.com/" target="_blank">than here</a>) or on <a href="http://twitter.com/storageio" target="_blank">twitter</a>?</p>
<p><a href="http://storageioblog.com/?p=1850" target="_blank"><img src="http://storageio.com/images/SIO_Buzzword_Bingo.gif" alt="Buzzword bingo image" width="465" height="240" border="0" /></a></p>
<p>Is it because the term value in some marketers opinion or view their research focus groups associate with being cheap or low-cost? If that is the case, I wonder how many of those marketing focus groups actually include active IT or ICT professionals. If those research marketing focus groups contact practicing IT or ICT pros, then there would be a <a href="http://storageioblog.com/?p=3603" target="_blank">lower degree of separation to the information</a>, vs. professional focus group or survey participants who may have a <a href="http://storageioblog.com/?p=3603" target="_blank">larger degree of separation</a> from practioneers.</p>
<p><a href="http://storageioblog.com/?p=3603" target="_blank"><img src="http://storageio.com/images/DegreesSeperate.jpg" alt="Degrees of seperation image" border="0" /></a></p>
<p>Depending on who uses valueware first and how used, if it becomes popular or trendy, rest assured there would be bandwagon racing to the train station to jump on board the marketing innovation train.</p>
<p><a href="http://storageio.com/images/EMC_NetApp_Tracks.mpg" target="_blank"><img src="http://storageio.com/images/TrainTracks.jpg" alt="Image and video with audio of train going down the tracks" border="0" /></a></p>
<p>On the other hand, using valueware could be an innovative way to help articulate <a href="http://storageioblog.com/?p=1149" target="_blank">soft product</a> value (read more about <a href="http://storageioblog.com/?p=1149" target="_blank">hard and soft product here</a>). For those not familiar, <a href="http://storageioblog.com/?p=1149" target="_blank">hard product</a> does not simply mean hardware, it includes many technologies (including hardware, software, networks, services) that combined with best practices and other things to create a <a href="http://storageioblog.com/?p=1149" target="_blank">soft product</a> (solution experience).</p>
<p>Whatever the reason, I am assuming that valueware is not going to be used by creative marketers so let us have some fun with it instead.</p>
<p>Let me rephrase that, let us leave valueware alone, instead look at the esteemed company it is in or with (some are for fun, some are for real).</p>
<ul>
<li>APIware (having some fun with those who see the world via APIs)</li>
<li>Cloudware (not to be confused with cloud washing)</li>
<li>Firmware (software tied to hardware, is it hardware or software? <img src="http://storageioblog.com/wp-includes/images/smilies/icon_wink.gif" alt=";)" /> )</li>
<li>Hardware (something software, virtualization and clouds run on)</li>
<li>Innovationware (not to be confused with a data protection company called <a href="http://www.fdr.com/" target="_blank">Innovation</a>)</li>
<li>Larryware (anything Uncle Larry wants it to be)</li>
</ul>
<p><a href="http://storageioblog.com/?p=3860" target="_blank"><img src="http://storageio.com/images/Oracle_Challenge.gif" alt="Image of uncle larry aka Larry Elison taking on whomever or whatever" width="380" height="480" border="0" /></a></p>
<ul>
<li>Marketware (related to marketecture)</li>
<li>Middleware (software to add value or glue other software together)</li>
<li>Netware (RIP <a href="http://www.nnp.org/nni/Publications/Dutch-American/noorda.html" target="_blank">Ray Noorda</a>)</li>
<li>Peopleware (those who use or support IT and cloud services)</li>
<li><a href="http://storageioblog.com/?p=888" target="_blank">Santaware</a> (come on, <a href="http://storageioblog.com/?p=888" target="_blank">tis the season right</a>)</li>
<li>Sleepware (<a href="http://storageioblog.com/?p=872" target="_blank">disks and servers spin down</a> to sleep using <a href="http://storageioblog.com/?p=872" target="_blank">IPM techniques</a>)</li>
<li>Slideware (software defined marketing presentations)</li>
<li>Software (something that runs on hardware)</li>
<li>Solutionware (could be a variation of implementation of soft product)</li>
<li>Stackware (something that can also be done with Tupperware)</li>
<li>Tupperware (something that can be used for food storage)</li>
<li>Valueware (<a href="http://valueware.us/" target="_blank">valueware.us</a> points to this page, unless somebody wants to buy or rent it <img src="http://storageioblog.com/wp-includes/images/smilies/icon_wink.gif" alt=";)" /> )</li>
<li>Vaporware (does vaporware actually exist?)</li>
</ul>
<p>More variations can be added to the above list, for example substituting ware for wear. However, I will leave that up to your own creativity and innovation skills.</p>
<p>Let’s see if anybody starts to use <a href="http://valueware.us" target="_blank">Valueware</a> as part of their marketware or value proposition slideware pitches, and if you do use it, let me know, be happy to give you a shout out.</p>
<p>Ok, nuff said.</p>
<p>Cheers gs</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/hardware-software-what-about-valueware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://storageio.com/images/EMC_NetApp_Tracks.mpg" length="639132" type="video/mpeg" />
		</item>
		<item>
		<title>FluidOps Provides Better Data from Multiple Sources with Semantic Modeling</title>
		<link>http://tek-tips.nethawk.net/fluidops-provides-better-data-from-multiple-sources-with-semantic-modeling-2/</link>
		<comments>http://tek-tips.nethawk.net/fluidops-provides-better-data-from-multiple-sources-with-semantic-modeling-2/#comments</comments>
		<pubDate>Sat, 15 Sep 2012 17:32:23 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[FluidOps]]></category>
		<category><![CDATA[Information Bench]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6789</guid>
		<description><![CDATA[The market that NoSQL addresses is quite wide and populous. It includes not only databases but also utilities to accelerate data collection, analytics, and visualization. The whole idea of Big Data is to derive useful intelligence and information from the vast amount of data that were ignored and discarded before. So in a way, it [...]]]></description>
			<content:encoded><![CDATA[<p>The market that NoSQL addresses is quite wide and populous. It includes not only databases but also utilities to accelerate data collection, analytics, and visualization. The whole idea of Big Data is to derive useful intelligence and information from the vast amount of data that were ignored and discarded before. So in a way, it is data mining and business intelligence. But Big Data is different in the magnitude of its volume, velocity, and variety. In the enterprise market, most data in question are in known formats (structured), and their variety is limited. Also, it is rare that a vast amount of data comes in real time. But this is changing now because of SNS and the mobile computing invasion.</p>
<p>Fluid Operations (FluidOps) aggregates data from different sources and converts them with some intelligence for better analysis. I sat with Peter Haase, senior architect, and chatted about their <a href="http://www.fluidops.com/information-workbench/">Information Workbench</a>, a comprehensive tool for collecting and analyzing data and visualizing useful information.</p>
<p>&nbsp;</p>
<p><img src="http://www.altaterra.net/resource/resmgr/fluid-1.jpg" alt="" /></p>
<p>&nbsp;</p>
<p><img src="http://www.altaterra.net/resource/resmgr/fluid-2.jpg" alt="" /> <a name="rg_hi"></a><a name="il_fi"></a></p>
<p>Peter Haase</p>
<p>Fluid Operations is located in <a href="http://en.wikipedia.org/wiki/Walldorf">Walldorf, Germany</a>. SAP&#8217;s headquarters is there as well. They currently have no US office, but their website provides information in both German and English. Peter and other people from the company are fluent in English.</p>
<p>As in other areas, in the power business, utilities companies collect and aggregate various kinds of data in addition to meter-read data. They may monitor equipment on the distribution grid, such as transformers, switches, relays, and capacitor banks. The data from the equipment and the meter-read data may be generated at dramatically different speeds. In addition to dynamic and real-time data, some static data types like asset information, including equipment location, brand, model, specification, and service records, may be required to provide preventive maintenance and report malfunctions and failures. The FluidOps solution is to collect and aggregate data from multiple sources and then to translate each datum semantically to a common form so that it has more meaningful information associated with it. Since all the translated data are in the same form with more meaningful relationships among them, analytics becomes more effective and can lead to more appropriate action.</p>
<p>&#8220;Semantically” means that they convert collected data into their normal form, which is represented using the <a href="http://en.wikipedia.org/wiki/Resource_description_framework">Resource Description Framework</a> (RDF). I will not get into details here. Although it is not the same but in a way, it is similar to <a href="http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model">Entity-Relation mode</a>l. An example diagram is shown here.</p>
<p><img src="http://www.altaterra.net/resource/resmgr/rdf_graph_for_eric_miller.png" alt="" /></p>
<p><a href="http://en.wikipedia.org/wiki/File:Rdf_graph_for_Eric_Miller.png">An example RDF graph (Source:Eric Miller)</a></p>
<p>All the data collected are converted into this format. The query language for RDF is SPARQL Protocol and RDF Query Language (SPARQL).</p>
<p>FluidOps <a href="http://www.fluidops.com/information-workbench/">Information </a>Workbench consists of data integration and storage, data management, and presentation/interaction/UI customization layers. At the 30,000-foot view, it collects and associates data using semantic models from diverse industry segments. For example, the <a href="http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData">Linking Open Data Community project</a> is an attempt to make data from different industry segments freely available, and for that, data are represented in RDF. The segments include media, geographic, publications, user-generated, governments, and life science. Their relationships are shown in the following diagram, which is maintained by <a href="http://richard.cyganiak.de/">Richard Cyganiak</a> and <a href="http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/team/JentzschAnja.html">Anja Jentzsch</a>.</p>
<p><a href="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/fluid-3.jpg"><img class="alignnone size-full wp-image-6815" title="fluid-3" src="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/fluid-3.jpg" alt="" width="500" height="325" /></a></p>
<p>&nbsp;</p>
<p><a name="il_fi1"></a></p>
<p>Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch: available <a href="http://richard.cyganiak.de/2007/10/lod/%20">here</a>. Data published in Linked Data format based on the <a href="http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData">Linking Open Data Community project</a>.</p>
<p>Click each circle on the figure <a href="http://richard.cyganiak.de/2007/10/lod/">here</a> (not the figure above) to drill down through each dataset.</p>
<p>The following figure illustrates how Information Workbench collects and associates data with other data to increase their value semantically.</p>
<p><img src="http://www.altaterra.net/resource/resmgr/fluid-4.jpg" alt="" /></p>
<p>The disparate sources include tweets, Facebook, YouTube, data.gov, office documents, and various video files.</p>
<p>The architecture of Information Workbench is shown below. It consists of a data integration and storage layer (green), data management (brown), and presentation, interaction, and UI customization (blue).</p>
<p>&nbsp;</p>
<p><img src="http://www.altaterra.net/resource/resmgr/fluid-infobench.jpg" alt="" /></p>
<p>&nbsp;</p>
<table cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td width="61" height="0"></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>Fluid Operations looked at the availability of RDF datasets to exploit for effective analytics. Their current application areas include media and health care and life sciences. I asked Peter about its application to the power industry. He said they were not looking into that yet but may consider it if they get a research grant. I do not know whether a dataset is already available for the power industry, but I think it might help the industry to exploit something like this.</p>
<p>I talked about each utility&#8217;s operation, but if we look at each region, such as <a href="http://www.ferc.gov/industries/electric/indus-act/rto.asp">ISO/RTO</a>, the regional power balance information and data are very useful. I would like to follow this as it grows.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/fluidops-provides-better-data-from-multiple-sources-with-semantic-modeling-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GigaSpaces Accelerates NoSQL Databases</title>
		<link>http://tek-tips.nethawk.net/gigaspaces-accelerates-nosql-databases/</link>
		<comments>http://tek-tips.nethawk.net/gigaspaces-accelerates-nosql-databases/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 23:11:55 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Mobile and Wireless]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Cloudify]]></category>
		<category><![CDATA[Gigaspaces]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[XAP]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6743</guid>
		<description><![CDATA[I video-interviewed GigaSpaces&#8217; Nati Shalom, its founder and chief technology officer, in March regarding their Cloudify product. Nati Shalom Cloudify is a tool to smoothly launch applications in a cloud environment with a recipe that describes everything necessary, including resources and their configurations. GigaSpaces&#8217; new product (at least, I thought it was new) is eXtreme [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://www.nethawk.tv/cloudify-provides-application-recipe-for-cloud" target="_blank">video-</a>interviewed GigaSpaces&#8217; Nati Shalom, its founder and chief technology officer, in March regarding their <a href="http://www.gigaspaces.com/cloudify-open-paas-stack" target="_blank">Cloudify</a> product.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/gigas-2.png" alt="" /></p>
<p><img src="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/gigas-1.jpg" alt="" /><br />
Nati Shalom</p>
<p>Cloudify is a tool to smoothly launch applications in a cloud environment with a recipe that describes everything necessary, including resources and their configurations. GigaSpaces&#8217; new product (at least, I thought it was new) is eXtreme Application Platform (<a href="http://www.gigaspaces.com/datagrid" target="_blank">XAP</a>), an accelerator for NoSQL databases. GigaSpaces&#8217; XAP is not a database, analytics tool, or visualization tool. In short, it is an <a href="http://www.gigaspaces.com/datagrid" target="_blank">in-memory utility</a> to enable real-time data processing for other NoSQL databases, like Cassandra and MongoDB. SQL or no SQL, the rate and the speed of Big Data have become a problem for a database to process. A simple solution is to put some kind of front end in place to process such high-volume and high-speed data. In-memory data processing is usually much faster than any data storage dealing with disk I/O. Both VoltDB and Couchbase, which I interviewed at the same conference, use their implementation of an in-memory database for this. Other databases may partner with other companies to provide such a technology. Nati referred to <a href="http://radar.oreilly.com/2011/12/5-big-data-predictions-2012.html" target="_blank">Edd Dumbill&#8217;s blog</a>, which says that one of the trends in Big Data is streaming data processing. For that, in-memory technology is invaluable.</p>
<p>I thought XAP was a new product that came after Cloudify. However, XAP was developed about 10 years ago, when the Internet bubble was in full bloom. They thought high-speed data processing was necessary to accommodate business-to-business (B2B) interactions with scalability. As we all know, the dot-com era did not last very long, and their prediction did not materialize. Actually, the financial community was an early adopter because of real-time data processing in such things as credit card transaction processing and stock trading. So GigaSpaces decided to develop a product to serve those needs in 2004, and they have kept improving it over the years. The current version of XAP is the ninth edition. As I wrote in a previous blog, the NoSQL domain includes companies that develop databases, utilities, analytics engines, and visualization tools. This classification is shown in <a href="http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/" target="_blank">Matt Aslett&#8217;s blog</a> with leading NoSQL companies. Matt places GigaSpaces in the data/grid cache category.</p>
<p>The following is a summary of my chat with Nati about his solutions and his view of the NoSQL market.</p>
<p>GigaSpaces is headquartered in New York and also has offices in San Jose, CA, London, and Israel. Nati said that Big Data is fueled by different things, depending on the geography. In the US, SNS drives Big Data on the West Coast, while the financial requirements mentioned above drive it on the East Coast. SNS and financial applications are very different, but they both generate a high volume of data at high speed. SNS, especially, generates data in an unformatted way, such as tweets.</p>
<p>Regarding the relationship between XAP and Cloudify, they are currently tightly integrated. Data cluster management is necessary for the management of large data sets. Cloudify needs the same data cluster management for applications. Thus, the two share the same underlying data cluster management platform. After all, applications and data should go hand in hand for provisioning and management. <a href="http://natishalom.typepad.com/nati_shaloms_blog/2012/03/big-data-in-the-cloud.html" target="_blank">Nati&#8217;s blog</a> describes this integration in more detail. In short, XAP accelerates data acquisition and Cloudify manages the cluster.</p>
<p>I was not sure about the relationship between the two products. Nati gave me a little more processing information, as follows:</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/gigas-3.gif" alt="" /></p>
<p>When streaming processing is required, both XAP and Cloudify are deployed. If streaming is not required, Cloudify alone is appropriate. This is a logical diagram, but in reality those three boxes can run on a single physical server, or two on the same machine, because XAP should work closely with a database. Cloudify is written in Java, and XAP is written in both Java and C++. XAP not only accelerates data acquisition but also provides data processing and guarantees data consistency.</p>
<p>Next I asked him to draw a picture showing where something like GigaSpaces&#8217; XAP resides between NoSQL and NewSQL. Here&#8217;s the picture, showing the two domains in an oversimplified manner for ease of understanding. Note that each database, whether NoSQL or NewSQL, is different in its offering and performance. For example, Couchbase claims high processing power, although it is classified as NoSQL database.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/gigas-4.gif" alt="" /></p>
<p>OK, then, there seems to exist another category between NoSQL and NewSQL. I asked Nati what this new category is. His answer was that it is a Big Data system or streaming/real-time processing system. Remember <a href="http://radar.oreilly.com/2011/12/5-big-data-predictions-2012.html" target="_blank">Edd Dumbill&#8217;s blog</a>. Nati said that streaming processing is currently a niche area but definitely required for application areas that process a high volume of data at high speed or with little tolerance for latency, such as financial transactions like risk analysis.</p>
<p>I asked him about the application of streaming data processing. Some utilities companies process Big Data with Hadoop to analyze meter-read data. Streaming processing may be a niche, but it is becoming necessary to process such things as meter-read data that may come in from millions of power meters in semi- or real time. It would be interesting to combine those data streams with weather data that might also change in real time. For a balancing authority like <a href="http://www.caiso.com/Pages/default.aspx" target="_blank">California ISO</a>, which is tasked with balancing power demand and supply in real time, the real-time data sources vary and can be very large. It is necessary to source a large volume of data to process to get a good picture of the status of the power grid in real time to avoid blackouts. I have yet to see any examples of streaming data processing in the utilities business, but I think such an application area exists.</p>
<p>Nati mentioned that real-time requirements are growing and that Google, which invented the concept of Hadoop, is moving to Percolator, which supports real-time Big Data. Maybe this domain will not remain a niche for long.</p>
<p>The whole GigaSpaces system looks pretty complex, and integration seems to require a lot of hand-holding. Nati said that it normally does, but he makes extra efforts to make it very easy. He continued as follows:</p>
<blockquote><p>&#8220;BigData systems are complex by definition &#8211; look at Hadoop, NoSQL, etc. What we do is integrate them in a consistent way and make reduce large part of the operational complexity and development complexity.</p>
<p>If you would compare the amount of effort that is required to build a twitter like real-time analytics with GigaSpaces you&#8217;ll see that all you need to write few snippet of code to process your logic, scaling, fail-over, integration with BigData storage, management and monitoring is all curved out from the developers.”</p></blockquote>
<p>They also provide training. One thing they thought of was an interface with popular NoSQL platforms like Cassandra and MongoDB. GigaSpaces has a semi-official partnership with those database companies. This is intended to exploit the fact that more people have worked with those databases; GigaSpaces can ride on their knowledge to lower the training curve.</p>
<p>Moving forward, I asked Nati to consult his crystal ball as to what will happen to the NoSQL/Big Data market. Will any standard emerge from a standards body or two? He told me that, as in many emerging markets, many of the companies will be consolidated and disappear, except for some like Hadoop, Cassandra, and MongoDB. As for the storage mechanism, one form is good for one thing but not other things. If there is a standard way of accessing data, key-value, tabular, or document-based data will be consolidated, but the forms themselves will survive because one size does not fit all. He also said that SQL by itself is not wrong but its implementation is. It is interesting to compare his remark with <a href="http://tek-tips.nethawk.net/voltdb-newsql-database-company/">Scott Jarr&#8217;</a>s, who said the same thing. Nati predicted some sort of standards would emerge by consolidation but not from standards bodies.</p>
<p>After conducting five interviews, I have some idea of what NoSQL is all about. One thing I am certain of is that the utilities business is increasingly dependent on ICT technologies. Without them, smart grid will not be accomplished.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/gigaspaces-accelerates-nosql-databases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Metric Insights Enhances Collected Data with Context for Business Intelligence</title>
		<link>http://tek-tips.nethawk.net/metric-insights-enhances-collected-data-with-context-for-business-intelligence/</link>
		<comments>http://tek-tips.nethawk.net/metric-insights-enhances-collected-data-with-context-for-business-intelligence/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 05:20:00 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Mobile and Wireless]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Metric Insights]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6740</guid>
		<description><![CDATA[Of the five companies I interviewed at the recent NoSQL Now 2012 conference in San Jose, two were database companies, two were analytics companies, and one was a technology company in the business of accelerating the speed of others’ NoSQL databases. One of the analytics companies was Metric Insights, which is based in San Francisco. [...]]]></description>
			<content:encoded><![CDATA[<p>Of the five companies I interviewed at the recent <a href="http://nosql2012.dataversity.net/" target="_blank">NoSQL Now 2012</a> conference in San Jose, two were database companies, two were analytics companies, and one was a technology company in the business of accelerating the speed of others’ NoSQL databases. One of the analytics companies was <a href="http://www.metricinsights.com/index.html" target="_blank">Metric Insights</a>, which is based in San Francisco. In the enterprise and elsewhere, we receive a vast amount of data in many intervals ranging from real time to hourly or even longer periods, with complexity of kind and format—or no format at all. The idea of Big Data is to derive useful information out of unmanageable data and use it for action to improve business. Business intelligence (BI) is similar to Big Data analytics but usually deals with data of more manageable volume, velocity, and complexity. This is changing as SNS and mobile computing also enter the enterprise market. Metric Insights says we need Big Data BI.</p>
<p><img src="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/metric_insights_ve.png" alt="" /></p>
<p>I sat down with Metric Insights&#8217; Marius Moscovici, CEO, and Steve Mock, COO, to find out what they were up to.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/mi-2.jpg" alt="" /></p>
<p>From left: Marius Moscovici, CEO, and Steve Mock, COO</p>
<p>When I interview companies at conferences, I usually want to know what they do in relation to the conference theme, and what their differentiation, future directions, and competition are. What they do seems to be fairly easy to understand, but differentiation and competition may not always be that easy to figure out. The NoSQL-related domain is still being defined as it moves in many directions at Big Data creation speed. Certainly, we need solid database technologies, including databases themselves and any utilities that enhance them, analytics engines, and good visualization tools.</p>
<p>For further expansion of this market, it is vital to get buy-in from the enterprise. <a href="http://www.mcknightcg.com/" target="_blank">William McKnight</a> in his keynote speech advocated for putting NoSQL in the enterprise market and emphasized that only then would the NoSQL market become legitimate. Riding on what the enterprise already embraces would lower entry barriers for NoSQL-related technologies and services.</p>
<p>Business intelligence has been a big push in the enterprise market. Even before the age of Big Data, in a typical enterprise domain there was a large set of data not shared among different individuals and departments such as call centers, engineering, marketing, and HR. If a marketing campaign reflected call-center customer feedback, a company might be able to sell more of their products and services. Metric Insights&#8217; goal is to enhance BI capability by increasing each datum&#8217;s value by associating it with more meaningful information. Because BI is already accepted in the enterprise segment, their goal is reasonable. They want to expand transitional BI into Big Data BI by sourcing data from Big Data as well as from traditional sources.</p>
<p>As shown below, Metric Insights collects data from multiple heterogeneous sources and adds context (relevant attributes, as shown in the figure) to make that data more effective and valuable. Metric Insights says this creates useful insights. The collected and context-enhanced data are stored in intermediate form (JSON) in a database. (By the way, <a href="http://blog.appfog.com/why-json-will-continue-to-push-xml-out-of-the-picture/" target="_blank">some</a> say that JSON will push XML out completely, and <a href="http://www.nczonline.net/blog/2008/01/09/is-json-better-than-xml/" target="_blank">others</a> say, not so fast, because the world is not built by Web alone. But that is not the focus of this blog.) When data have more attributes or context, you can provide more effective analytics because you have more relevant information on each datum.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/mi-3.jpg" alt="" /></p>
<p>Metric Insights’ system consists of data collection, augmenting data with other attributes (context), analysis, and visualization. By the way, I mentioned to those gentlemen that I was more interested in how the backend works than in their user interface (UI). Yes, the backend is important, but the front-end, the UI, is crucial in the BI segment. So gentlemen, I take back what I said. My comment came from my techie point of view. When you use a BI system, the first thing people pay attention to is the UI. Because not all BI users are data scientists, BI specialists, techies, geeks, or interested in how it works, its use should be easy and intuitive without lengthy training. When you present your BI tool, if it does not communicate its ease of use and simplicity, no one will pay any attention to it.</p>
<p>Metric Insights prepares typical dashboards for ease of use for a given application. The example below is for a sales database. Sales reps just select a pane to get to what they want instead of creating complex queries (like &#8220;what I would like to do”) to obtain the result they are looking for.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/mi-4.jpg" alt="" /></p>
<p>A typical screenshot of the UI for the sales database is given in the following. This example shows product releases and the number of daily sales demos made. When a new release is given, it is likely to have an increased number of daily demo requests. But if there is any sudden increase or decrease, you can take a look at that particular point and drill down because more relevant context is available and can be added as an annotation.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/mi-5.jpg" alt="" /></p>
<p>Architecturally, it uses a persistent cache to accommodate real and semi-real time data speed and store data in a local Mysql database as well as a document-based store (JSON format). Since it is document based, it is easy to add more information for each datum. Their system works with some well-known Big Data storage/databases and technologies, such as Cloudera, MongoDB, and Google BigQuery. Additionally, a secondary memory-based caching layer is used to optimize end-user access speed of analytics.</p>
<p>Their application areas include sales, production, inventory, and finance, and they are expanding their scope to include recruiting talent. This is an interesting area. It used to be difficult to gain information on each individual because publicly available personal information was very limited. A résumé is written to cast the best light on the job applicant, and references usually provide only positive comments. Now in the era of SNS, we can gather a vast amount of information on individuals when they are at their ease and off their guard, so to speak.</p>
<p>In a way, Metric Insights and Fluid Operations provide a similar product. They both collect data from multiple sources, convert them to a standard form with additional information, apply analytics, and visualize results. On the surface they are similar, but their focus and implementation differ significantly. Metric Insights uses context to enhance each datum and obtain insight, then stores it in JSON-based storage, which is more common for NoSQL players and more relaxed (and easier to manage) than the semantic model Virtual Operations uses.</p>
<p>I think both approaches are valid, and each has its good application areas. The market is still evolving and is big enough for both of them. I asked Metric Insights if they have considered the power industry as an application area. They have not considered it yet, but I think their product can be used for that. The power industry will face multiple Big Data problems, as they will have more real-time monitor data, such as meter-read, equipment status, data feeds from other systems like weather, static information like assets and service logs, and SNS. A utilities backoffice is filled with disjointed applications without much data sharing, which can be improved very much by something like this technology. I do not know how, but that is up to folks like Metric Insights and Fluid Operations.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/metric-insights-enhances-collected-data-with-context-for-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chatting with Bob Wiederhold of Couchbase, NoSQL Database Company</title>
		<link>http://tek-tips.nethawk.net/chatting-with-bob-wiederhold-of-couchbase-nosql-database-company/</link>
		<comments>http://tek-tips.nethawk.net/chatting-with-bob-wiederhold-of-couchbase-nosql-database-company/#comments</comments>
		<pubDate>Wed, 05 Sep 2012 17:29:49 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Mobile and Wireless]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6714</guid>
		<description><![CDATA[When you talk to many people in the same domain, you either get totally confused or begin to see some commonality in their views and thus some light. Every vendor has its claims about its technologies and products. Some try to emphasize their merits and downplay their demerits. That is understandable. I sat down with [...]]]></description>
			<content:encoded><![CDATA[<p>When you talk to many people in the same domain, you either get totally confused or begin to see some commonality in their views and thus some light. Every vendor has its claims about its technologies and products. Some try to emphasize their merits and downplay their demerits. That is understandable. I sat down with Bob Wiederhold of <a href="http://www.couchbase.com/" target="_blank">Couchbase</a> at a recent <a href="http://nosql2012.dataversity.net/" target="_blank">NoSQL conference</a> and asked about the company and its products. Bob was very frank about their products and the status of their progress.</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/couch-1.jpg" alt="" /><br />
Bob Wiederhold, President and Chief Executive Officer, Couchbase</div>
<p>Couchbase was formed about a year and a half ago (February 2011) by merging Membase (based in Mountain View, CA) and Couchone (based in Oakland, CA). Bob came from the Membase side, and the new Couchbase is located in Mountain View. Couchone was behind Apache <a href="http://couchdb.apache.org/" target="_blank">CouchDB</a>, which is open source written in <a href="http://en.wikipedia.org/wiki/Erlang_%28programming_language%29" target="_blank">Erlang</a> (also open source) and with Apache License 2.0. Most of the original key developers and committers (including Damien Katz) for Apache CouchDB moved from Couchone to Couchbase. The original developers and committers still contribute to Apache CouchDB, but most efforts are now focused on Couchbase 2.0, which is a separate open source project also licensed with Apache License 2.0 and is being implemented mostly in C. This is because Erlang is a functional programming language and C is more appropriate to increase speed. I’ve dealt with many programming languages in the past but never touched Erlang before. Bob emphasized that while Couchbase is heavily influenced by Apache CouchDB, it is a completely separate open source project. Bob told me that the merger went very smoothly and they are now about 100 people strong.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/couch-2.gif" alt="" /></p>
<p>+</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/couch-3.gif" alt="" /></p>
<p>||</p>
<p><img src="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/couchbase.gif" alt="" /></p>
<p>I told Bob that I am confused by the NoSQL market, and he shared his view of it. It is interesting to hear different persons&#8217; views on the market. Of course, there is not 100% agreement on the current market, but different views sometimes give me a pretty good perspective. He first distinguished the operational from the analytics engine, as below. The analytics engine is a Hadoop and its derivatives, such as Cloudera, Hortonworks, and MapR. Note that Couchbase is a partner of Cloudera.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/couch-5.gif" alt="" /></p>
<p>Then he expanded the NoSQL area according to technology and placed NoSQL players in each category. I will not discuss each category in detail. Those who want more detail can reference <a href="http://en.wikipedia.org/wiki/NoSQL" target="_blank">here</a>. Wikipedia classifies the NoSQL categories in a much finer way. For example, there are several subcategories for the key-value camp, and it distinguishes the graph-based from the object-based ones. By the way, at the conference I took a four-hour crash course given by Dan McCreary of <a href="http://www.danmccreary.com/" target="_blank">Kelley-McCreary &amp; Associates</a>. It was a good tutorial, and if you had the chance, you could sit down and spend a half day in his class. I also thought a whitepaper by Couchbase, Navigating the Transition From Relational to NoSQL Database Technology, useful. It describes document-based technology in comparison with the relational database.</p>
<p>The current version of Couchbase (1.8) is in the key-value camp. But come the 2.0 release, it will become a document-based database completely. Each camp has it merits and shortcomings. Will one category dominate others and all its technologies be consolidated into one? As for what will happen to this market, Bob thinks the following.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/couch-6.gif" alt="" /></p>
<p>He thinks the key-value and the document-based databases will be merged, and the merged area will be the biggest of the three new areas. The other two areas will not go away but remain somewhat a niche market. The document-based solution is powerful, as it can contain a document like an entire website as a blob (in a JSON format) and retrieve it. For this, JSON is becoming the de facto standard over XML; Couchbase also uses JSON. There are proponents for both <a href="http://blog.360.yahoo.com/blog-TBPekxc1dLNy5DOloPfzVvFIVOWMB0li?p=736" target="_blank">JSON</a> and <a href="http://www.nczonline.net/blog/2008/01/09/is-json-better-than-xml/" target="_blank">XML</a>. In the Web environment, JSON is far more suitable, but XML has its own areas of application. There are a few tools for converting JSON to XML and vice versa.</p>
<p>As for the competition, Bob was very frank in analyzing Couchbase against other players in the document camp, as in the following table. Checkmark size indicates how strong and complete an attribute is. Well, the size is somewhat arbitrary and just indicates relative competency. Bob said that Couchbase has put a lot of emphasis on performance, scalability, and always-on features (thus, big checkmarks) with less focus on ease of development (thus, a smaller checkmark). He also added that with the 2.0 release, ease of development will improve significantly since this is the point at which they become a document database. He said that his competition has put a lot of emphasis on ease of development but needs to work on other features.</p>
<p>Couchbase moves to focus on ease of development</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/couch-7.jpg" alt="" /><br />
Competition moves to other features</div>
<p>&nbsp;</p>
<p>He said although ease of development requires a lot of expertise, other things, like performance, are very hard to improve. He told me Couchbase has a big advantage in that it can consistently provide sub millisecond latencies for reads and writes that is often 1/3 to 1/10 the latencies of other solutions. In addition, Couchbase can provide throughput/server that is often 2-4x higher than competing solutions (see <a href="http://bit.ly/NKJkVH" target="_blank">http://bit.ly/NKJkVH</a> and <a href="http://bit.ly/Qulb4R" target="_blank">http://bit.ly/Qulb4R</a>). The consistent low latency assures very responsive applications and the higher throughput per server means you need to buy less hardware and software than with other competing solutions.</p>
<p>The current application areas that use Couchbase include social gaming, ad and offer targeting, social networking, online business services, e-commerce, cloud data services, and mobile-to-cloud data synchronization. Because I am interested in the application of NoSQL technologies to power utilities companies for smart meters and monitoring (such as with sensors with SCADA access) many types and speeds of data (static, like asset data, to real-time meter-read data), I wondered how products like Couchbase can be applied. Bob’s view was that as the amount of sensor data and the frequency at which it is gathered increases, having a central database that can keep up with the inflow of data will become a challenge. NoSQL databases that have an ability to linearly scale up write throughput are an easy solution to capture the incoming data stream. Techniques like Couchbase Server&#8217;s incremental map reduce are ideal to provide real-time aggregation/analytics over the data.</p>
<p>I asked him about an ecosystem for each player. He thinks developing an ecosystem is vital for the success of Couchbase. The way things are, the market seems still very confused, but it is expanding rapidly.</p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/chatting-with-bob-wiederhold-of-couchbase-nosql-database-company/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VoltDB, NewSQL Database Company</title>
		<link>http://tek-tips.nethawk.net/voltdb-newsql-database-company/</link>
		<comments>http://tek-tips.nethawk.net/voltdb-newsql-database-company/#comments</comments>
		<pubDate>Tue, 04 Sep 2012 23:58:55 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Company News]]></category>
		<category><![CDATA[Data Center]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Editorial]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Mobile and Wireless]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[On-Demand]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Telecommunication]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Scott Jarr]]></category>
		<category><![CDATA[VoltDB]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6709</guid>
		<description><![CDATA[At the recent NoSQL conference in San Jose, California, I had a chance to chat with Scott Jarr, cofounder and chief strategy officer of VoltDB. I wrote an overview blog where I touched on VoltDB, and this is a detailed version of my conversation with Scott. Scott Jarr When I was researching the NoSQL segment, [...]]]></description>
			<content:encoded><![CDATA[<p>At the recent <a href="http://nosql2012.dataversity.net/" target="_blank">NoSQL conference in San Jose, California</a>, I had a chance to chat with Scott Jarr, cofounder and chief strategy officer of VoltDB. I wrote an <a href="http://altaterra.site-ym.com/blogpost/288668/148406/How-NoSQL-Relates-to-the-Energy-Business" target="_blank">overview blog</a> where I touched on VoltDB, and this is a detailed version of my conversation with Scott.</p>
<p><img src="http://tek-tips.nethawk.net/wp-content/uploads/2012/09/voltdb.png" alt="" /></p>
<div style="width: 233px; font-family: Arial; font-size: small; color: #828282;"><img src="http://voltdb.com/sites/default/files/Scott%20Jarr.jpg" alt="" width="233" height="253" align="BOTTOM" border="0" /><br />
Scott Jarr</div>
<p>When I was researching the NoSQL segment, I found it confusing enough, but there is also a NewSQL movement, which confused me further. The NoSQL movement began in an effort to accommodate the Big Data phenomenon. In the traditional database segment, ACID—atomic, consistent, isolated, and durable—is of utmost importance. The relational database was developed to guarantee ACID and for transaction-oriented applications. The traditional relational or SQL database is fine, as long as the data comes in at a reasonable speed and volume and is of limited variety. But at some point, these parameters exceeded what the traditional SQL database could handle, and new ways to cope with them were increasingly required. That is where NoSQL comes in. NoSQL, in general, does relax some of the rigid SQL rules (abandoning SQL partially or altogether and thus ACID) and accommodate these new requirements; i.e., scale-out, high availability (HA), replication, and performance. Therefore, NoSQL in general does not have SQL, relational schema, joins, or ACID (this is obvious since these are traits of the relational/SQL database). Scott put the comparison of Old SQL, NoSQL, and NewSQL on a piece of paper as we spoke. I reproduced it here. Old SQL (yet another term) refers to the traditional relational/SQL database that dominates the enterprise world.</p>
<div style="width: 300px; font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-2.gif" alt="" /><br />
Comparison of Old SQL and NoSQL</div>
<p>So in other words, in order to gain scale-out, HA, replication, and performance, NoSQL abandoned SQL/relational schema partially or altogether. What NewSQL is saying is that it can accomplish every feature in the table above while keeping the relational/SQL schema (and therefore ACID and join).</p>
<p>If that table is expanded with NewSQL, we have the following.</p>
<div style="width: 300px; font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-3.gif" alt="" /><br />
Comparison of Old SQL, NoSQL, and NewSQL</div>
<p>How can that be possible with NewSQL? Performance gain is a result of new architectures that remove the old baggage of OldSQL, many leverage memory for additional improvements. Actually, Gigaspaces, which I also interviewed and will write a blog about later, has an in-memory cache technology working with other NoSQL companies. However, Michael Stonebraker, CTO of VoltDB, said in one of his talks (available in a 30-minute video) that running in memory alone does not guarantee the performance gain needed to accommodate the speed at which Big Data comes in.</p>
<p>Mike explained in his talk that there is nothing wrong with the concept of SQL itself. It is the implementation of SQL that causes the problems shown in the table. Because of the less than perfect implementation, 96% of the time is spent on overhead and only 4% on useful work, as indicated in the following graph extracted from his presentation.</p>
<div style="width: 550px; font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-4.jpg" alt="" /><br />
CPU cycle use in a typical SQL implementation. Most of it—96%—is used for overhead.</div>
<p>Unless these overheads are removed, even if all data is placed into memory, extreme performance improvement is not expected, because it only addresses the 4% but not the rest of the 96%. Typical NoSQL databases abandoned or partially supported SQL to bypass this problem. VoltDB faced the current inefficient implementation of SQL and developed their version of the SQL database from the ground up to eliminate these overheads. I am not covering each overhead in detail, but you can watch his easy-to-follow video.</p>
<p>OK, I get it. Then, what does this mean to the whole area of NoSQL? Does this mean the whole area of NoSQL gets consolidated into a single technology like NewSQL? Scott drew me a good figure to explain this, which he had already published in his own blog posts (the figure below came from <a href="http://voltdb.com/company/blog/big-data-value-continuum" target="_blank">part-1</a> and <a href="http://voltdb.com/company/blog/big-data-value-continuum-part-2" target="_blank">part-2</a>).</p>
<div style="width: 550px; font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-5.gif" alt="" /><br />
From Scott Jarr&#8217;s blog. On the Y axis, data speed, size, and complexity grow upwards.</div>
<p>There are five areas to address in the enterprise in terms of data collection and analysis (analytics): interactive, real-time analytics, record lookup, historical analytics, and exploratory analytics.</p>
<p>The five areas are further explained in the following figure with applications and time scale.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-6.jpg" alt="" /></p>
<p>Note that VoltDB is colored differently from the rest of NewSQL in the graph, but that is meant to emphasize its position in the NewSQL group. VoltDB falls into the NewSQL camp. Scott emphasized its performance superiority over others. The performance benchmark they share is 3 million transactions per second (TPS). According to Scott, the traditional RDBMS is trying to cope with the Big Data problem (velocity, volume, and variety) by scaling up (throwing in more CPU and storage power rather than using parallel computing).</p>
<p>In <a href="http://www.slideshare.net/Dataversity/newsql-vs-nosql-for-new-oltp-michael-stonebraker-voltdb" target="_blank">Stonebraker&#8217;s video</a>, he said that VoltDB was five times faster than Cassandra and also faster (he did not say how much) than an unnamed incumbent&#8217;s database. When I consulted with MySQL, it was very fast (before their 5.0, which incorporated enterprise-ready features) and faster than this incumbent&#8217;s, but they could not publish the benchmark for fear of a lawsuit. I can understand that. When I consulted for JBoss, a Japanese open-source consortium compared their performance with other products like IBM&#8217;s Websphere, without any tuning. The number was not very good, mainly because those who ran the benchmarks did not know how to tune JBoss’s compared to IBM&#8217;s. After a JBoss engineer flew over there and tuned it, it improved drastically. So when we conduct a performance comparison, we need to set up a ground rule for comparison for every participant.</p>
<p>Of course, there are some overlaps among those technologies and their areas of applications, but this figure is a good picture of how each technology is suited for its application area. Hadoop is batch processing based and is not suitable for real-time analysis. Many people think Big Data and Hadoop are synonymous, but this clearly shows they are related but not the same. In the utilities business, a large amount of meter-read data gets collected, aggregated, and stored. By daily or monthly analysis of power usage for a particular area, a utilities company can probe into usage patterns and trends. Actually, some utilities are using Hadoop now, according to the Soft Grid conference.</p>
<p>Scott thinks NewSQL, NoSQL, DataWarehouse, and Hadoop will remain separate technologies because each of them is suited for some specific area of data collection and analytics. But he advocated that these areas and their tools be tightly integrated to provide analytics and thus effective real-time actions, as in the following figure. By incorporating the analytics results for long time spans into short-time analytics, more effective actions could be obtained.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-7.gif" alt="" /></p>
<p>Finally, he showed the current applications of VoltDB, as follows.</p>
<p><img src="http://altaterra.site-ym.com/resource/resmgr/voltdb-8.jpg" alt="" /></p>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/voltdb-newsql-database-company/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How NoSQL Relates to the Energy Business</title>
		<link>http://tek-tips.nethawk.net/how-nosql-relates-to-the-energy-business/</link>
		<comments>http://tek-tips.nethawk.net/how-nosql-relates-to-the-energy-business/#comments</comments>
		<pubDate>Mon, 27 Aug 2012 06:22:07 +0000</pubDate>
		<dc:creator>Zen Kishimoto</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Community Manager]]></category>
		<category><![CDATA[Data Center]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Information Technology]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Real time]]></category>
		<category><![CDATA[Real-time analytics]]></category>
		<category><![CDATA[VoltDB]]></category>

		<guid isPermaLink="false">http://tek-tips.nethawk.net/?p=6665</guid>
		<description><![CDATA[Recently, I watched the Soft Grid conference, put out by GreentechMedia via Ustream, and was pleasantly surprised that many smart grid and utilities people talked about Big Data and cloud computing. Then I went to the 2012 NoSQL Now conference, where I interviewed five companies and sat in on several of the sessions there. I [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I watched the <a href="http://www.greentechmedia.com/events/live/the-soft-grid-2012/" target="_blank">Soft Grid conference</a>, put out by GreentechMedia via Ustream, and was pleasantly surprised that many smart grid and utilities people talked about Big Data and cloud computing. Then I went to the <a href="http://nosql2012.dataversity.net/" target="_blank">2012 NoSQL Now conference</a>, where I interviewed five companies and sat in on several of the sessions there. I will post a blog for each interview later. For now, let me describe my understanding of what NoSQL is and how it may be applied to the energy business.</p>
<p><img src="http://tek-tips.nethawk.net/wp-content/uploads/2012/08/nosqlnow.jpg" alt="" /></p>
<p>I consulted for MySQL before and knew something about the relational database market. But after it was bought by Sun, I stopped following it. I knew there was such a thing as NoSQL but initially thought it was &#8220;No to SQL”; it is more like &#8220;Not only SQL.” NoSQL started to get attention circa 2009, and the NoSQL Now conference was only started in 2011. So it is a relatively new area and, as in any new area, the market is very confused. Many terminologies and acronyms are floating around, with many claims by vendors. Quite frankly, it is very, very hard to walk through this market without getting totally confused. Prior to attending the conference, I studied the companies I planned to interview and read anything and everything I could put my eyes on. The sad reality was that I was further confused.</p>
<p>What is NoSQL, technology-wise, component-wise, and application-wise?</p>
<p><em>Technology-wise</em></p>
<p>The NoSQL market can be described in a few ways. One way is to categorize it by the technologies used. The 451 Group&#8217;s Matt Asllet, in his blog <a href="http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/" target="_blank">NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases</a>, gave a pretty good picture of the market, with categories and the vendors who belong to each category.</p>
<div style="width: 665px; font-family: Arial; font-size: small; color: #828282;"><img src="http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg" alt="" width="665" height="380" align="BOTTOM" border="0" /><br />
Matt Asllet&#8217;s database categories</div>
<p>
This figure alone is very valuable. This figure helped me to understand where my interviewees’ companies fall in.</p>
<p>This view is great, but I was still not comfortable enough to say, &#8220;Yes, I got it.” Bob Wiederhold, president and CEO of <a href="http://www.couchbase.com/" target="_blank">Couchbase</a>, made it much simpler for me.</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/nosql-o-2.jpg" alt="" /><br />
Bob Wiederhold</div>
<p>
He thinks NoSQL is playing in a segment that is not concentrated on by NoSQL players that are good at transactions or suitable for backoffice applications. He further classified NoSQL into four categories:</p>
<ul>
<li>Key value</li>
<li>Document</li>
<li>Column family</li>
<li>Graph</li>
</ul>
<p>The current Couchbase (1.8) belongs to the key-value camp but will move to the document camp at its 2.0 version launch. He also told me that the key-value and document camps are being merged and the combined camp will be the biggest of the three new categories. I plan to write about his interview in a future blog.</p>
<p><em>How they fit together in the enterprise</em></p>
<p>How do NoSQL technologies fit into the enterprise? William McKnight, of McKnight Consulting Group, presented a keynote speech titled, &#8220;Putting NoSQL in its Place—in the Enterprise.”</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/nosql-o-3.jpg" alt="" /><br />
William McKnight</div>
<p>
One of his slides shows really well how data is collected, aggregated, and analyzed in the enterprise, and which components are there for each function. Data are collected for analysis; otherwise, there is no reason to collect them. There are two major groups for analysis: real time (streaming) and static (stored data). In his slide, Hadoop (which processes data in batch mode) is placed on the analytic side. But if you need to analyze a massive amount of data as it comes in real time, you need streaming analysis. Hadoop is not meant for that. That is why we need databases that can handle real-time streaming data, which is in a totally different area from that of Hadoop.</p>
<div style="width: 500px; font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/nosql-o-4.jpg" alt="" /><br />
In the picture, blurry brown lines indicate a set of clouds. The components surrounded by the brown lines may be hosted in a cloud.</div>
<p>
<em>Application Areas</em></p>
<p>This is great. Then, what about application areas? Where does each NoSQL technology apply? Scott Jarr, cofounder and chief strategy officer at <a href="http://voltdb.com/">VoltDB</a>, gave me the following figure.</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://voltdb.com/sites/default/files/Scott%20Jarr.jpg" alt="" width="332" height="362" align="BOTTOM" border="0" /><br />
Scott Jarr</div>
<p>
Actually, he drew this on a piece of paper but he had a published <a href="http://voltdb.com/company/blog/big-data-value-continuum-part-2" target="_blank">blog</a>. I will cover it in more detail in a future blog. He looked at the five areas of applications: interactive, real-time analytics, record lookup, historical analysis, and exploratory. He then placed each Big Data technology in one of the five areas. This is a pretty good explanation of NoSQL in terms of application areas. In the figure, VoltDB is colored differently from NewSQL, but he classified it in the NewSQL camp.</p>
<p><img src="http://voltdb.com/sites/default/files/Big%20Data%20Value%20Continuum%20Image%204b.png" alt="" width="499" height="335" align="BOTTOM" border="0" /><br />
<br />
Applications to Energy (Smart Grid)</p>
<p>The applications areas discussed most throughout the conference were publication, financial, and SNS. A couple of people said that SNS is a driving force for Big Data and NoSQL on the West Coast, but on the East Coast it is primarily financial communities. What about its application to smart grid? In the Soft Grid conference, focus was on metered data, which will be collected, aggregated, and stored in real time but analyzed in no real-time fashion. I heard during the Soft Grid conference that some utilities were using Hadoop to analyze their metered data.</p>
<p>The Northeast blackout of 2003 was caused because timely actions were not taken to isolate the problem area from the rest of the power grid, and faults cascaded to the entire area. The causes of the blackout were studied intensely. But in 2011, it was repeated in the San Diego area. The initial cause may be different from the one in 2003, but the impact cascaded in the same way as in 2003. With the more connected ICT technologies, modern monitoring systems like SCADA, and real-time analytics of power grid health, this could be avoided. The decision to cut off faulty areas from the grid requires real-time action by monitored data coming in in real time because power moves very quickly. This is an application area that is different from the trend analysis done with Hadoop.</p>
<p>Those companies I interviewed told me the application to smart grid may be an interesting idea, but it is still premature, as they do not see the market forming. Finally, I just want to mention that David Brown of EMC, a parent company of VMware, used <a href="http://info.vmware.com/content/12834_gemfire?src=PaidSearch" target="_blank">GemFire</a> to implement data collection and analytics for some unnamed utilities. His case was an exception, and I guess the market is still being formed for the utilities.</p>
<div style="font-family: Arial; font-size: small; color: #828282;"><img src="http://altaterra.site-ym.com/resource/resmgr/nosql-o-5.jpg" alt="" /><br />
David Brown</div>
]]></content:encoded>
			<wfw:commentRss>http://tek-tips.nethawk.net/how-nosql-relates-to-the-energy-business/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
