Jun

Reducing Power Consumption In Data Centers At Peak Load

Most of the western data center operators take the total capacity available for data center, it is then subtracted from the loss on the distribution of power and the power consumed by mechanical cooling systems, then reduce the result, at least 10-20% to protect against the risk of exceeding the maximum permissible value, and receive energy, which is allocated to the IT load. Such an approach may lead to the fact that the IT load is an excess of electricity.

The main problem is that almost none of the data centers run at full capacity and some even up to 50% of its capacity, since it is unlikely that all servers will be run simultaneously at full load. And, at some different times of workloads, even if the load of some services will reach 100%, we are often based on the fact that peak loads do not coincide at the same time. With this in mind, you can use more servers than the number available at your disposal of electrical energy.

This is the approach used in the airline ticket.

And, just like the airlines, which may overbook more passengers than seats in the aircraft.

Here are three ways to resolve problems with the growth of consumption in data centers:

Shut Off workloads that do not affect customer service;
Stop or standard non-critical workloads;
forced transfer of the servers in a low-power modes.

The latest decision is a favorite topic of research, but it is almost never used in practice, because it is equivalent to solving the problem of selling extra tickets by placing the two passengers in one place. In some ways, it works but it is insecure and does not make customers happy. Option 3 reduces the amount of resources available for all workloads by reducing the overall quality of service. For most commercial organizations that cannot be a good economic decision. The best can be considered as options 1 and 2.

One of the classes of applications, which work hard to make energy-efficient, are interactive information-intensive workloads. Search the Internet, advertising, machine translation – are examples of this type of workload. These workloads can be very profitable, so the above option 3, option to reduce the quality of service cannot be economically justified.

The best solution for these workloads could be the calculations of energy consumption. In essence, the purpose of calculation of proportionate energy consumption – to ensure that the server is running with a load of 10%, could consume 10% power server running at full load. Of course, there are overheads, and this goal will never be fully achieved, but the closer we get to it, the smaller will be the costs and impact on the environment by using standard workloads.

The good news is that, in this direction we have achieved some success. When it was first proposed to use the calculations with commensurate energy consumption, many servers are in standby mode that can consume 80% of energy consumed by them at full load. Today a good server can reduce their energy consumption to 45% in standby mode. We did not come close to our goal, but make good progress. In fact, the CPU is very energy efficient by today’s standards, but the largest consumers of electricity are the other components of the server. Memory is a great opportunity, and mobile devices show us the limits of the possible. I hope we will continue to make progress, borrowing the idea of cell phones in the industry and applying them to the dedicated servers.

In Power management of interactive data-intensive services, a group of researchers from Google and the University of Michigan has studied the problem of power commensurate with the standard (OLDI) systems using these types of workloads as searching Google, advertising, and translation. These workloads are difficult because they provide the required delay time which is performed through the use of large modules built-in cache and when the workload is reduced, these machines must be in working condition to meet the requirements of the application to the delay. It cannot be an option of concentration of the workload on a small number of servers – the size of the cache requires that all servers have continued to be accessible, and therefore, when the workload will be reduced, all servers must be provided with a work load, so that the whole system could not go into low power mode.

The size of the cache memory data requires use of all of the servers, so when the workload is reduced, the load of each server is reduced in proportion, but in fact it never goes into standby mode. They should always be included and ready to handle these requests with the required delay time.

Provided by the CPU switches to low power mode may be the best and the only mechanism for balancing power and performance, but by itself it is not possible to achieve a commensurate power.

There is a need to improve the low power modes during periods of downtime for a shared cache and integrated memory controllers. There is a great opportunity to save energy costs of system memory using a low power mode [mobile systems today do well with it, so that techniques are available].

Even with batch requests, a translation of the entire system in a low power modes during inactivity cannot provide an acceptable balance between latency and power consumption. In the case of a coherent approach, the translation of the entire system to the active low power mode is the most promising solution to ensure a balanced power consumption while maintaining acceptable delay requests.

If we generalize the standard types of workloads (OLDI), the goal of providing the required delay is achieved by allocating cache memory which is very large between the running servers. When the workload is reduced from maximum to minimum values, all of these servers are less loaded, but they did not go into standby mode, and therefore cannot translate the whole system into low power mode.

I like to look at the servers that support these workloads, as if in a two dimensional representation. Each row represents one complete copy of the cache memory, distributed among hundreds of servers. One could serve a number of these workloads and successfully meet the required levels of latency applications, but a number will not increase. To increase the workload beyond that can be handled in the same row, additional rows will be required. When the system searches for a query, it is not sent to hundreds of systems, but only the servers in the same row.

This method of scaling at the level of the series gives an almost complete proportionality of the overall level of the data center, except for the following two problems:

The workload cannot be reduced below the number for all the reasons described in the article.
When the workload is very dynamic, rapidly jumping from minimum and maximum values should be kept in readiness in case additional rows if needed, which further reduces energy consumption commensurate offered by this technique.

If the workload is much higher than one series and predictably varies between the minimum and maximum values, this method of scaling at the level of the series gives very good results. It does not work, if the workloads vary greatly, or when you want to scale less than one series.