Key Data Center Cooling Metrics

A detailed overview of metrics that yield insight into data center operation and how to interpret them

·Paul Bemis, Davis Cole · 6 min read
Table of Contents

Power Utilization Effectiveness (PUE)

The most widely adopted metric for data center efficiency, PUE was defined by The Green Grid as the ratio of the total energy used in a data center to the energy used by IT equipment (servers, network switches, UPSs, etc.).

\[ \begin{equation} \text{PUE}=\frac{\text{total facility energy}}{\text{IT equipment energy}} \end{equation} \]

A PUE of 1.0 indicates that all energy entering the data center is being used for revenue-generating IT workloads. Meanwhile, a PUE of 2.0 suggests that only half the total data center energy is used for IT equipment. This implies that this metric characterizes the overhead needed to run a facility.

According to the Uptime Institute’s 2024 Global Data Center Survey, average data center PUE has settled at about 1.5.

PUE Value Data Center Efficiency
< 1.2 Excellent
1.3-1.5 Good
1.6-1.8 Acceptable
> 1.8 Poor

Issues with PUE

Despite PUE’s ability to roughly classify the efficiency of a data center, the metric has received criticism as the industry evolved. Because its inputs are broadly classified as “facility energy” and “IT energy” important nuances are ignored, including how energy is classified, what workloads the IT equipment is running, and power distribution and cooling losses within the the IT equipment.

Classification

Would server fans be considered facility energy or IT energy? Some may posit that server fans serve the purpose of cooling the server, and should be considered cooling or facility energy, while others may argue their contribution is to IT energy, perhaps in the aim for a larger denominator resulting in a more favorable figure.

Workloads

Since PUE does not consider how the IT energy is being used, a hall of half-idle servers can flaunt a “world-class” PUE of 1.15 while burning megawatts on unproductive silicon—hence the push for IT-level gauges such as ITUE and TUE, which tie energy to useful work.

Comparison Between Data Centers

Considering two identical 10 MW halls in Denver and Austin reveal how climate alone skews PUE. Denver’s cool, dry, high-altitude air gives offers more free-cooling hours, keeping mechanical overhead below 1.5 MW and PUE near 1.15. Austin’s hot, humid summers force chillers >6 k h, doubling overhead to ~3 MW and sinking PUE to ~1.35. That 0.2-point swing—caused purely by geography—shows PUE works for trending one site, not ranking sites across regions.

Summary

PUE gives a quick efficiency snapshot, yet its broad “facility” versus “IT” buckets hide rack-level power losses, let subjective load classifications skew results, and ignore how much useful work the servers perform. Because local climate alone can shift a score by double-digit percentages, the metric excels at trending one site but misleads when ranking different regions.

Use it as a baseline indicator and supplement with workload-aware, component-level, and climate-normalized KPIs to steer design decisions that truly cut energy use.

Rack Cooling Index™ (RCI)

The Rack Cooling Index, a trademark of ANCIS Inc., captures how effectively server racks are being cooled by considering the ASHRAE TC 9.9 thermal guidelines for recommended and allowable inlet temperatures. The guidelines offer a range of 64.4-80.6°F, resulting in RCIHi and RCILo to cover each end of the range. These are defined below as:

\[ \text{RCI}_{Hi}=\left[1-\left(\frac{\sum_{i=1}^n(T_i-T_{R,Hi})}{N\times(T_{A,Hi}-T_{R,Hi})}\right)\right]\times 100 \]

where:

  • \(T_i\) is the maximum inlet temperature for the ith rack
  • \(T_{R,Hi}\) is the ASHRAE max recommended tmperature (80.4°F)
  • \(T_{A,Hi}\) is the ASHRAE max allowable tmperature (89.6°F)
  • \(N\) is the number of racks exceeding the max recommended temperature

and:

\[ \text{RCI}_{Lo}=\left[1-\left(\frac{\sum_{i=1}^n(T_{R,Lo}-T_i)}{N\times(T_{R,Lo}-T_{A,Lo})}\right)\right]\times 100 \]

where:

  • \(T_{R,Lo}\) is the ASHRAE minimum recommended temperature (64.4°F)
  • \(T_{A,Lo}\) is the ASHRAE minimum allowable temperature (59°F)
  • \(N\) is the number of racks below the minimum recommended temperature

Put into words for RCIHi:

  • RCIHi = 100%: all rack inlet temperatures are below \(T_{R,Hi}\)
  • RCIHi < 100%: at least one rack inlet temperature is above \(T_{R,Hi}\)
  • RCIHi = 0%: all rack inlet temperatures are above \(T_{R,Hi}\)

Similarly for RCILo:

  • RCILo = 100%: all rack inlet temperatures are above \(T_{R,Lo}\)
  • RCILo < 100%: at least one rack inlet temperature is beLow \(T_{R,Lo}\)
  • RCILo = 0%: all rack inlet temperatures are below \(T_{R,Lo}\)

Further intuition for RCI can be gained through the table below:

RCI Value Cooling Performance
100% Ideal
> 96% Good
91-95% Acceptable
< 90% Poor

Example RCIHI Calculation

To work through an RCIHI calculation, consider the table below:

Rack Inlet Temp (°F)
1 68.0
2 71.6
3 77.0
4 82.4
5 86.0
6 78.8
7 66.2
8 91.4
9 75.2
10 73.5

Reminding ourselves that the ASHRAE recommended temperature range is 64.4-80.6°F, we consider the racks outside of this range and subtract the max recommended temperature from the inlet temperature value:

Rack Inlet temp (°F) High side excess (°F)
4 82.4 1.8
5 86.0 5.4
8 91.4 10.8

Three racks are out of compliance, therefore \(N=3\). We then sum the high side excess for these racks:

\[ \sum_{i=1}^n(T_i-T_{R,Hi})=1.8+5.4+10.8=18°\text{F} \]

Now, we have all the terms needed for the full RCIHi calculation:

\[ \begin{align} \text{RCI}_{Hi}&=\left[1-\left(\frac{\sum_{i=1}^n(T_i-T_{R,Hi})}{N\times(T_{A,Hi}-T_{R,Hi})}\right)\right]\times 100 \\ &=\left[1-\left(\frac{18°\text{F}}{3\times(89.6°\text{F}-80.6°\text{F})}\right)\right]\times 100 \\ &=\left[1-\left(\frac{18}{27}\right)\right]\times 100 \\ &=\left[1-\left(\frac{2}{3}\right)\right]\times 100 = \boxed{33.33\%} \end{align} \]

This value suggests that there are serious cooling issues with our sample data center.

The calculation can be similarly performed for RCILo.

Return Temperature Index™ (RTI)

RTI, another trademark of ANCIS Inc., measures the effectiveness of the air management system from an energy standpoint and is defined by:

\[ RTI=\frac{\text{Rack }\Delta T}{\text{Air handler }\Delta T}\times 100 \]

To understand the principle, a quick refresher in basic heat transfer fundamentals may be helpful. Recall the specific heat formula in rate form:

\[ \dot{Q}=\dot{m}c_p\Delta T \]

where:

  • \(\dot{Q}\) is the rate of heat added or removed (W)
  • \(\dot{m}\) is the mass flow rate of the working fluid (kg/s)
  • \(c_p\) is the specific heat capacity of the working fluid (J/kg-K)
  • \(\Delta T\) is the temperature difference, \(|T_{out}-T_{in}|\)

In a data center, heat transfer occurs mainly in two places:

  1. inside the server rack, where cool supply air is heated by electronic components
  2. inside the air handler, where warm air exhausted by the racks is cooled by a heat exchanger

In both places, it is ideal for heat transfer to be maximized. This can be done by increasing any of \(\dot{m}\), \(c_p\), or \(\Delta T\):

  • Specific heat is a constant defined by the cooling system’s working fluid and cannot be increased
  • Increasing the mass flow rate will require the rack and air handler fans to run at higher power, and is not practical or scalable

This leaves manipulating \(\Delta T\), illustrating why rack and air handler \(\Delta T\) are used in the RTI metric. CRAC unit efficiency increases as return temperature increases.

Since \(\Delta T\) is typically prescribed by operators, a more intutiive representation is to subsitute \(\Delta T\) with flow rate:

\[ RTI=\frac{\text{Rack flow rate}}{\text{Air handler flow rate}}\times 100 \]

If the rack flow rate is greater than the air handler flow rate, RTI > 100%, the racks can be starved of cool air, and recirculation from rack exhausts to rack inlets can occur, raising rack inlet temperatures and lowering cooling effectiveness.

If the air handler flow rate is greater than the rack flow rate, RTI < 100%, cool supply air can bypass the rack inlets and short-circuit to the CRAC return, lowering return temperature and CRAC effectivness. This can occur if the air handler flow rate is increased to combat rack hotspots.

RTI Value Airflow status
100% Balanced
> 100% Net recirculation
< 100% Net bypass

While RTI = 100% is ideal in a vacuum, slight deviations from this mark may be necessary in real-world contexts. If mixing is required to ensure an even distribution of rack inlet temperatures, a RTI > 100% would be desirable to drive recirculation and mixing.