How to Move a Data Center in 35 Days – Week 3: The Power Balancing Act

In Week 3, we began virtualization of 35 servers that were coming out of our co-location facility, with the goal of migrating them in-house. We also had a mission to take up less space and consume less energy –all in 35 days.

The topic of energy and power manifested itself in other ways in week three of our four-week project. As we began installation of these 35 servers in our new data center, the issue of how to best provide uniform power to the Main Data Frame (MDF) proved to be tricky. If power isn’t evenly distributed across the MDF chassis, the machines won’t power on, and even if they do, there’s a risk of errors. Even, clean distribution of power is an essential element of a successful MDF installation. Because of the changes in floor layout and our plans to virtualize several machines, we couldn’t simply replicate the power plan from the old location. Basically, we were starting from scratch – with one week left!

Our power plan started like this: a 400Amp breaker was supposed to be feeding the MDF, including a 200Amp subpanel that feeds most of the buss ducts. On the buss above the three cabinets are three phases, with each one using up to 80 amps. However, since they are fed off the 200A subpanel that can only draw a max of 160Amps per phase, the 400A that feeds the 200 that feeds the 100 can only draw 320. Confused yet? We were!

Additionally, each power strip had either a 20A or 15A breaker, and each of the junction boxes off of the buss duct also has a 20, or 30 amp breaker. None of the breakers can trip, or we have a mini-disaster on our hands. We were beginning to look at our power plan like it was one of those brain teasers. While we were fairly certain that our plan kept us under all the maximums for consumption, power remained a concern.

- – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - -
Managing Web Infrastructure Storage Cost, Complexity, and Performance
- – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - -

As we began installing and powering up equipment in the MDF, we had to re-adjust our original plan due to loading issues. We expected some of this – it’s customary with every build. We had to re-think the use of large power strips, since they couldn’t handle the load as well. We decided to use four smaller ones instead, especially for dense cabinets. We decided that shorter power cords were better for our new space, as the six-foot ones took up too much space in the rack.

But all that changed when the Dell/APC cabinets we ordered showed up and were wider than expected. This meant that the floor layout and power plan needed to be redone quickly, and it also brought into question our use of the shorter power cords. Normally, a change like this wouldn’t be an issue, but when working under a deadline like ours, even small changes seem bigger.

Once our power puzzle was solved, we began virtualization of the 35 total boxes. Our goal was to have them moved out of the Boston co-location facility in 30 minutes. But once again, we hit a snag with performance issues, and after four hours of troubleshooting we discovered a bad blade that needed to be swapped out in order to continue.

By day 20 of the project, we had achieved the following:

  1. Successfully virtualized about half the major boxes, including two application servers
  2. As a result of virtualization, removed 13 of the servers we had been using in the previous co-location facility
  3. Installed fiber switches configured for backup tape drives
  4. Resolved Port and IP issues with our VM clusters
  5. Planned for the AC installation in the Main Data Frame

Upon completion, we did a quick power audit and see how much power we gained, and discovered a net gain of 25 amps. This was a good sign, and we felt confident enough to start powering up some of the gear, including our SAN switches. However, after adding two more virtualized servers, our estimate showed us at about 8 amps short. This new finding required an additional UPS in the rack and also some unexpected downtime in the project, so the team could install new fiber runs and make more network reconfigs. We also discovered a problem with the conversion of the physical SAP application server to virtual, caused by a Compaq kernel driver that monitored the disk array that no longer exists, since it is virtual now. Time for more work-arounds from the team!

Looking back on it, we learned some best practices on distribution of power through a data center.

If you are going to a colo or hosted data center:

  1. Make sure they provide enough power. Many data centers only give you 150-200 watts per sq/ft, which may not be enough for dense environments like virtualization. Newer data centers should be able to supply more power, while older data centers may make you rent more square footage to get enough power for your gear. Cooling and power go hand in hand– so check that, too.
  2. Check to make sure they bill for power used. Many times they charge per circuit, which is fine, as long as you realize it and it’s priced accordingly. If you are paying for say four 20 amp circuits at 120volts though and they bill you for 19.2kw (120*20*8) at current market rates, you are probably getting overcharged. Also these facilities rarely let you use more than 80% of the power, so of your 20 amps circuits, you can only use 16amps.

If you are designing your own data center, here are some additional tips:

1. Trace your power all the way back to the street to see where your single points of failure are. You should plan to use redundant circuits versus a single main feed. For example; if you go back to a single 800amp breaker, you risk both your primary and secondary power if that breaker trips. To do it right, you should have separate feeds, generators, UPS and switchgear all the way through the building.

2. Physically separate the two feeds as much as you can (A side on the right, B on the left) and try to bring it back to separate rooms if possible.

3. Don’t forget redundant cooling. If you have power but no cooling you are likely to just overheat all of your gear anyway. Overheating may not cause failures right away, but can cost more over time than the downtime would have.

4. Document, document, document and restrict who can make changes.

5. Test the power redundancy by turning off power at the street and seeing what happens. Ideally a bunch of pagers will go off and that’s it. If not: figure out why.

Mark Townsend

About Mark Townsend

Mark Townsend's career has spanned the past two decades in computer networking, during which he has contributed to several patents and pending patents in information security. He has established himself as an expert related to networking and security in enterprise networks, with a focus on educational environments. Mark is a contributing member to several information security industry standards associations, most notably the Trusted Computing Group (TCG). Townsend's work in the TCG Trusted Network Connect (TNC) working group includes co-authoring the Clientless Endpoint Support Profile. Townsend is currently developing virtualization solutions and driving interoperability testing within the industry. Prior to his current position, he has served in a variety of roles including service and support, marketing, sales management and business development. He is considered an industry expert and often lectures at universities and industry events, including RSA and Interop. Mark is also leveraging his background and serving his community as Chairman of the local school board, a progressive school district consistently ranked in the top school districts of New Hampshire, with the district high school ranked as a "Best High School" by US News & World Report.

,

No comments yet.

Leave a Reply


*