The title is somewhat deceiving. Yes, I would like to touch on some trends of the data center infrastructure management (DCIM) market, but more than that I want to talk about a conversation I had with Alex Fielding. He just landed at a DCIM company called Vigilent and his background is so diverse and interesting. Was that a good enough reason for me to meet him for lunch? In the following, I will talk about Vigilent and our chat.
Alex Fielding, Vice President Federal & Energy, Vigilent
Gartner defines DCIM as:
Data center infrastructure management (DCIM) tools monitor, measure, manage and/or control data center utilization and energy consumption of all IT-related equipment (such as servers, storage and network switches) and facility infrastructure components (such as power distribution units [PDUs] and computer room air conditioners [CRACs]).
There are other references to DCIM, like this one. I have written on the subject several times in the past. Some definitions go into more detail and others are very simple like Gartner’s, although I am sure Gartner has a more detailed one as well.
What does it do?
Vigilent is a DCIM company that manages the cooling side of a data center. Their premise is very straightforward and the benefit is easy to understand, according to Alex and NTT, one of its customers. As is well-known, most data center operations are performed by facilities folks, and IT folks are likely to be deemed guests – albeit demanding guests.In the past, several attempts were made to integrate the IT and facilities camps to achieve more efficient data center operations. Although there have been some successes, it is a hard sell to either of the two camps.
Alex told me that Vigilent’s success is due to two reasons. One is that it deals with something that has a major impact (40% to 50% of cooling power consumption) on data center operations – effective cooling. It allocates necessary cooling only where needed, so its benefit is highly visible. The other reason is that it deals with the facilities side alone and although it benefits the IT side by providing more consistent and reliable server cooling, it doesn’t touch the server nor need to involve IT folks. He also said that if something is easy to understand and soon shows benefits, it is straightforward to justify and the facilities folks would love it. He added that it only takes a few days from installation to achieving energy savings.
How does it do it?
The Vigilent technology is easy to describe. It monitors and measures cooling needs in a data center and dynamically allocates just enough cooling where it is required. Servers and other IT equipment may produce more or less heat, depending on their loads. Loads dynamically fluctuate by the hour or by the day. A data center is often designed and provisioned with an overcapacity of power and cooling. It is not necessary to overcool IT equipment, and if cooling requirements lessen, cooling should be adjusted accordingly.
That is exactly what Vigilent does. In general, temperatures at a data center are adjusted by sensing warm air returning to CRAC units. What’s really important is the temperature at each server intake point – not at each CRAC unit. There was no easy way to measure the intake air temperature and flow before. In the past few years, temperature and other sensors have begun to be deployed to monitor environmental conditions at a server or at the rack level, which allows more efficient temperature control to be applied, avoiding overcooling.
Vigilent uses wireless sensors for this purpose and connects them via a mesh network based on technology from Dust Networks. On the basis of continually collected temperature information, cooling capacity is automatically adjusted by turning on and off CRAC units and also controlling their fan speeds. The following figure depicts this.
Data Center Energy Efficiency Metrics
Both Alex and I are interested in data center energy efficiency. Alex said that power usage effectiveness (PUE) is not always considered an effective energy efficient measurement for a data center. I think it took more than five years before this concept sank in for the people who run their data centers daily. It is simple and intuitive and gave data center operators a tool for measuring energy efficiency.
Although I do not belittle the significance of PUE, it has a few problems. I will not repeat them because they have been discussed in many other places as well as here. You can improve (or cook) a PUE number simply by artificially increasing IT power consumption unnecessarily and intentionally. Alex told me that he has heard of such a case.
To avoid such a deficiency of PUE, it is necessary to take IT energy efficiency and utilization into consideration. There are a few metrics proposed to improve PUE, such as CADE and DCeP. Unfortunately, to date, none of them are as widely used as PUE. Alex likes CADE.
CADE is defined as:
CADE = IT Efficiency * Facility Efficiency, where
IT Efficiency = IT Asset Utilization * IT Energy Efficiency
Facility Efficiency = Site Asset Utilization * Site Energy Efficiency
The measurement for the IT side is much more difficult than that for the facilities side. And that seems to be the reason for its low adoption rate.
Vigilent has introduced the notion of cooling virtualization. What does this mean? If something is virtualized, it should be easily created, increased in quantity, removed, decreased in quantity, and moved dynamically like virtualized servers. (Note that there is always a physical limit to all the resources.)
Cooling can be:
- created (by turning on a CRAC unit)
- increased in quantity (by turning on additional CRAC units and/or increasing the fan speed of a CRAC unit)
- removed (by turning off a CRAC unit- or in the Vigilent case, changing it to stand-by mode)
- decreased in quantity (by turning off some CRAC units and/or decreasing the fan speed of a CRAC unit)
What about moving? Can we move cooling dynamically? Actually, even a virtualized server is not physically moved. A virtual machine’s (VM) move, such as vMotion, is actually performed as a combination of creation, copy, and removal operations. A new VM instance is created on a destination physical server, the execution state is copied from the original instance to the created one, and then the original instance is removed. This sequence also applies to cooling. VMs on a server may be moved or removed to other locations, and that triggers a change in cooling requirements. Cooling on the original server may be turned off or decreased in quantity, and, in turn, new or additional cooling may be instantiated on the destination server.
There are some differences between the moving of server VMs and cooling. Unlike VMs, cooling initiated at the destination is not a carbon copy of that at the original location. A VM is copied but cooling is not copied. Depending on several factors, such as physical layouts, cooling capacity may even change at the new location. But in reality, cooling capacity can be said to have moved from one location to another dynamically. So I think cooling can be virtualized, and cooling virtualization can be argued in a way similar to computing resources consisting of servers, storage, and networks.
As I discussed in the previous blog, a software-defined data center (SDDC) may be considered a data center OS that takes care of all the cumbersome chores for data center operators without regard to physical ICT and facilities equipment. But prerequisites for SDDC include virtualization (software defined) for both ICT and facilities equipment. ICT virtualization is well under way (see Yevgeniy Sverdlik’s article on page 52 here), but what about the facilities side, i.e., cooling and power? I touched on cooling virtualization above. I will talk about the last element of SDDC, software-defined power, in a future blog.
Alex and I talked about many more subjects, including his take on Openstack and people involved in that project. But I’ve run out of gas and will cover them sometime in the future.