Open Compute Project

The OCP Wants You

Wednesday, November 16, 2011 · Posted by at 19:22 PM

For those of you who attended the second Open Compute Project Summit can attest, the project has a lot of momentum, and the community has really taken shape.


As this is a community effort, we would love to hear what you're doing with the Open Compute Project. How do you plan on implementing OCP technologies in your data centers? What do you hope to contribute back to the project?


If you have something you want to share with the community, please tell us and we will post it on this blog. And do let us know if you would like to contribute content to the blog directly.

Continue reading this post

Learning Lessons at the Prineville Data Center

Thursday, November 17, 2011 · Posted by at 8:57 AM

Facebook's data center in Prineville, OR, has been one of the most energy efficient data center facilities in the world since it became operational early this year. Some of the innovative features of the electrical distribution system are DC backup and high voltage (480 VAC) distributions, which have eliminated the need for centralized UPS and 480V-to-208V transformation. The built-in penthouse houses the chiller-less air conditioning system that uses 100% airside economization and evaporative cooling to maintain the operating environment.

These features have enabled Facebook to reduce the energy consumption of the data center significantly, which is reflected in power usage effectiveness (PUE) of the facility. The PUE of the Prineville data center was 1.07 at full load, which was verified during commissioning. Since then, during normal operation of the facility, the PUE has varied between 1.06 and 1.1. The histogram of the available PUE trend data for the period of April 14, 2011, to September 30, 2011, is presented in figure 1 below.

pue fig11 600x259

Figure 1 PUE of Prineville Data Center

Challenges in Operations


Although these features have resulted in high efficiency, we have learned some lessons along the way. And as a part of our commitment to openness via the Open Compute project, we are sharing our experiences and lessons learned with the community, so that everyone might benefit from them.

One challenge we encountered was keeping our air handler lineups from "fighting" with each other as they dealt with the rapid changes in the temperature and humidity of the outside air between day and night. For example, if outside air dampers of one lineup are at 70%, the adjacent lineups would have their outside air dampers at 20-30%. This alternate modulation, or fighting, often led to stratification of air streams.

Another, more significant, issue was an error in the sequence of operation controls that led to complete closure of the outside air dampers, causing the one-pass airflow system to function like a recirculatory system. The problem began to manifest in late June as outside air conditions started changing rapidly. The economizer demand signal began responding to the changes; that's when the erroneous control sequence drove economizer demand to 0, leading to complete closure of the outside air dampers. Thus the data center was recirculating the hot exhaust air at high temperature and low humidity. The evaporative cooling system reacted to this high temperature and low humidity, spraying at 100% to maintain the maximum allowed supply temperature and dew point temperature. This resulted in cold aisle supply temperature exceeding 80°F and relative humidity exceeding 95%. The Open Compute servers that are deployed within the data center reacted to these extreme changes. Numerous servers were rebooted and few were automatically shut down due to power supply unit failure.

pue fig2
Figure 2 Failed component in power supply unit

 

The high temperature and high humidity supply air caused condensation on the concrete slab floor (because concrete has high thermal mass and was in contact with much cooler supply air for a long time). Similarly, upon investigation of the failed power supply units (figure 2), we observed that the failure was condensation-related.

Issue Analysis


We began investigating this failure by subjecting the server to rapidly changing temperature and humidity conditions in a controlled test chamber. The relative humidity level was raised to 97% and the temperature was ramped up from 15°C to 30°C (59°F to 86°F) in the span of 10 minutes. Under these conditions, the condensation was observed on the non-heated components. The server chassis was dripping wet, as you can see in figure 3. The motherboard, however, showed no signs of condensation due to the fact that it always ran above the dew-point temperature.

 

pue fig3

Figure 3 Condensation on the server chassis

 

Condensation was also evident on the surfaces of power supply components such as capacitors and inductors, as shown in figure 4.

pue fig4

Figure 4 Condensation on the power supply components

 

Figure 5 below shows the surfaces of inductors in front of capacitor 1 and the forward vertical surface of capacitor 1. We can see the water droplets formed on the surfaces of these non-heated components.

pue fig5a

Inductors and capacitor surfaces viewed from a borescope video

 

pue fig5b

Figure 5 Inductors and capacitor surfaces viewed from a borescope video

 

Figure 6 shows the variation in different temperatures monitored during the test interval. These are both targeted and actual values of ambient as well as dew-point temperature. The surface temperature of capacitor 1 (CAP1) is also plotted.

pue fig6

Figure 6 Temperature variation

 

The plot shows that the surface of CAP1 falls below the dew point at about 6 minutes into the temperature ramp. This is exactly the same time the borescope video starts showing a slight change in the reflectivity of the component surfaces. The condensation then continues for another 9 minutes until the surface temperature of CAP1 rises above dew point. During the entire test interval, the PCB in the power supply always ran above the dew point temperature and showed no signs of condensation.

All these findings suggest the possibility that the failures were caused by water droplets being blown onto the PCB of the power supply, rather than condensation occurring on the PCB itself. As shown in figure 7, the water droplets were observed on the AC/DC cables and connectors. It is highly likely that these droplets were blown into the power supply units when the facilities' maintenance staff increased the airflow in efforts to mitigate the problem.

pue fig7

Figure 7 Condensation on cables and connectors

 

Corrective Actions


The erroneous control sequence was promptly corrected and additional safeguards were added to eliminate the possibility of repeated occurrence of such an event. These safeguards include reevaluation of the minimum economizer demand setting, which will avoid the complete closure of the outside air dampers. Several monitoring points and alarm settings were modified to monitor and notify ahead of time should outside air conditions begin to change rapidly. Even though the supply air humidity, which was more than 95% at times, was out of the operational range of the power supply units (10-90% RH, non-condensing), conformal coating has been applied locally in selective areas of the PCB to avoid condensation and to strengthen the power supply units against such corner cases.

Continue reading this post

After the OCP Summit, Another Chapter Begins

Friday, November 18, 2011 · Posted by at 10:26 AM

291120 10150514722754606 662364605 11373777 727263757 o 600x400

The second Open Compute Project Summit was a resounding success, but that just means we as a community have a lot of work ahead of us to advance the goals and benefits of open hardware. Through a series of presentations by industry luminaries and technical workshops, hundreds of participants came together and discussed the Open Compute Project initiatives.


The Open Compute Project Foundation and its board were announced. Modeled after the Apache Software Foundation, the OCP Foundation will design and deliver tangible goods and source files to let people deploy OCP hardware in their environments. The five members of the OCP Foundation board are:



    • Andreas ("Andy") Bechtolsheim, Founder of Arista Networks and former Chief Hardware Designer at Sun Microsystems

 

    • Don Duet, Vice President of Information Technology at Goldman Sachs

 

    • Frank Frankovsky, Director of Technical Operations at Facebook

 

    • Mark Roenigk, COO at Rackspace

 

    • Jason Waxman, General Manager of the Data Center Group at Intel



The board reflects a diversity of industries, from supplier to consumer. You might be asking yourself why Goldman Sachs sits on the board. Facebook's Frank Frankovsky says that financial services companies "are IT companies more than they actually know," running large-scale compute environments.


He also said Intel has "one of the richest portfolios of thought leadership of tech in the industry," and that it's very important for Intel to be part of the OCP, so others will follow their lead.

 

Rapid Innovation Requires Open Standards 

 

It almost goes without saying that standards in computing are vitally important. The advent of open source software and standards ushered in an era of great advancements in innovation. Andy Bechtolsheim of Arista said the same needs to be done on the hardware side, because there is a lack of standards at the system level. In order for there to be innovation in the scale compute ecosystem, the "gratuitous differentiation" where one vendor's hardware is different than another's must be eliminated. Calling gratuitous differentiation "the enemy" and innovation "our friend," Bechtolsheim said there is a need to "create a mutual benefit for customers and vendors by creating a new market for open-standard system-level designs."


This view was echoed by Jimmy Pike, Senior Distinguished Engineer, Chief Architect and Technologist at Dell Data Center Solutions. "Standards can set you free," he said. Open hardware should have standard form and fit, physical interfaces, management interfaces, and technology elements. He mentioned Dell's Nucleon server, its entry level OCP platform.


Brian Stevens, CTO and Vice President, Worldwide Engineering at Red Hat, likens the mission of the OCP to that of Red Hat's, in that both are a "catalyst in communities of customers, contributors and partners building better technology the open source way." He noted that Red Hat certified the first two OCP systems for Red Hat Enterprise Linux, and that Red Hat certification brings hardware compatibility.

 

Success at Scale

 

Innovation in data center technologies has been flourishing in recent years, and the pace is only quickening. James Hamilton, Vice President and Distinguished Engineer for Amazon Web Services, said there has been more innovation in the past five years than in the previous 15. Innovations like evaporative cooling, full building ductless cooling, and using outside air -- all technologies used at Facebook's Prineville, OR, data center -- have brought down costs, increased reliability, and reduced the environmental footprint of data centers.


To give an idea about how much Amazon has had to scale in recent years, every day "Amazon Web Services adds enough capacity to support all of Amazon.com’s global infrastructure through the company's first 5 years."


The OCP will "democratize and bring together much more choice in the industry for people to get efficient platforms," Jason Waxman of Intel said, adding that Intel has a long history of supporting open standards, like PCI and wi-fi. The trend is the same. "If you present people with an open spec, everyone can innovate."

 

From Workshops to Working Groups

 

One of the primary results of the summit came from the five technical workshops that were held. The workshops covered data center design, hardware management, open rack, storage, and virtual I/O.


The output from each workshop is being turned into a charter and specification that we'll share with the community. We also set up complementary working groups along these same themes. Join the discussions by subscribing to any or all of the lists that you are passionate about:



 

 

 

 

 




We look forward to your involvement in the community. See you online!

Continue reading this post