Neeraj Bhatia's Blog

July 12, 2020

Audit and Compliance in the Cloud

Filed under: Uncategorized — neerajbhatia @ 10:34
Tags: , , ,

I have just penned down my thoughts on Audit and Compliance related challenges in the Public Cloud, please read the complete paper here.

Any thoughts/feedback would be welcome.

March 1, 2015

Top Challenges in the Life of an IT Capacity Manager

Filed under: Capacity Management — neerajbhatia @ 16:42
Tags: ,

Let me first confess. I don’t have double-digit years of experience in IT Capacity Management. But I would say, during last 9 years I have defined the process multiple times from scratch. I have seen capacity management process getting traction. Multiple times I was part of the journey where the process got matured from chaotic to efficient. I have also witnessed cases where management couldn’t justify the investment in the process and it got a natural death. Over the years I have talked to and interacted with numerous people working in the same or related domains and this helped me to understand how others are doing. I have also actively involved in the interview process for many candidates and this gave me an insight about how matured is the process in their organizations and what challenges they are facing.

All this makes me eligible to write this post where I have highlighted (or tried to) key challenges an IT Capacity professional faces. To make it readable, interesting and short, I have restricted myself to top 5, but it doesn’t mean no other challenges exist. My criteria for picking these was one which constitutes a foundation of the process and within the technical boundaries. Of course, political and cultural aspects, organization structure and all such things also play an important role in the success.

I would request the readers to make comments at the bottom of the post, should they have an experience, ideas or stories to share. If you feel there is an aspect which should have a space in this list, please comment and post review I will tweak this list.

So let’s start with the list of top 5 challenges.

1) Importance of Capacity Management: You may be surprised to find this at the top of my list, but believe me it’s a fact. Among a very small number of organizations where there is a dedicated team/resources responsible for Capacity management, very few actually understand the value addition an effective Capacity management can bring to the table. What I observed is that the reason of their existence is to support audit/regulatory requirements, or just a tick in the box. Majority of the times, system admins or incident management teams deal with capacity management responsibilities but in almost all case in a purely reactive way. Many times Capacity planners spend their office time in providing data to be consumed by IT service management community, Infrastructure people or business teams.

But this is not what Capacity Management is all about. ITIL defines goal of Capacity Management as

“The goal of the Capacity Management process is to ensure that cost-justifiable IT in all areas of IT always exists and is matched to the current and future agreed needs of the business, in a timely manner”.

I often say, it is also about “Doing more with less” and this we can achieve only when you use the techniques up to its full potential, only when you work in a proactive fashion. Firefighting is a part of capacity management professionals process but the true value can be only realized when work in a proactive way.

Many times I have seen established and working process to take natural death because management start realizing that there is no value add and capacity related incidents can be resolved pretty quickly by incident management teams. There is a part problem on Capacity Management professionals as well who at multiple occasions fail to highlight the benefits. I understand cost has a big role to play in the overall equation and to that, you can highlight cost in either of two categories: cost saving OR cost avoidance. Other than cost, effective process can reduce panic buying which saves cost, escalations and service disruptions. In today’s digital world there are great expectations from IT for super fast time to service (TTS), unlimited capacity at demand (perception from the Cloud), it is more important to have effective and efficient capacity management practices.

No story can be better told than the one with the data and evidence. Imagine the achievements are communicated to the management in the form of reduction in incidents, improved utilization of IT infrastructure, cost avoidance with the help of code optimization, configuration changes, release of unused capacity and reusing it elsewhere to reduce pressure on additional capacity buying.

So the crux is, it is a challenge to justify the need for Capacity management process but supported by real meat in terms of numbers, metrics it would become easier.

2) Bad start (under/over provisioning): This particular one is an interesting challenge as it gives you difficulties in inheritance. You were not involved when services were designed, architecture was defined and provisioning had happened. Mostly people out of fear that future capacity upgrade requests which not be entertained in view of economic challenges or will go through rigorous process which most of the cases are too bureaucratic and in order to play safe w.r.t potential performance or capacity issues, take defensive approach and ask for capacity for next couple of years on day 1 itself by exaggerating the capacity requirements. If capacity management process was not consulted or had a say during the provisioning, it becomes extremely difficult to reduce the capacity at a later stage. Remember in majority of the cases, the real value of Capacity management can be realized by improving the capacity utilization, identify cold areas in your IT estate and release the spare capacity and use it somewhere else where it is actually required.

This is also related to other side of the story. There was not enough thought given to the planning which resulted in capacity issues during early life of the service. This might be due to an inadequate capacity or support life of underlying IT Infrastructure. Due to the complexity it brings during the running services, it becomes extremely difficult to upgrade the capacity (scalability issues) or migrate the service.

This I would suggest can be avoided by proper control mechanisms in the provisioning process. For existing issues, biting the bullet is the only option and fix the issues once and for all, if the efforts and cost neutralize the cost of service disruptions.

3) CMDB: Configuration management database is the heart of effective IT processes and Capacity management is not an exception. What makes it more important for Capacity management process is the fact that all the IT configuration items being managed by the process should be there in CMDB along with latest configuration. Failing of which can potentially impact service stability. I have experienced this issue up to an extent that it took 6-8 months to associate all the core IT assets with the service in the CMDB.

I would say this is more of a governance issue. If there are proper processes defined and tight controls exist around them, it could be achieved pretty fast. For past sins, a remediation program can be roll-out which should have management buying. Technology can also be handy as they can automatically discover the components and do most of the leg work for you.

4) Data Issues (resource utilization, workload metrics): This I guess is low-hanging fruit in this list because it is technology dependent. Here my point is with respect to the very basic ingredient of Capacity management – component utilization data and workload metrics; how much is used, who is using it and what is left.

The reason this got qualified in this list is because in a large IT setup there is a standard way of doing anything. If there is a standard to use a particular monitoring toolset, you will be at the mercy of the tool to be rolled out to each IT component in the estate. This is quite opposite to small IT shops where you can exploit native commands/tools to extract utilization data and put it into a repository, after all even tools do the same but in a sophisticated way. Extracting workload metrics like average number of transactions per second of a particular type and their resource consumption could be tricky to get in certain situations if no basic monitoring or measurement practices have been followed during code development and this is where tools can make your life easier.

5) Business demand (forward view of workload data): This is about getting workload estimation for the future. This is a most important ingredient for capacity planning where we estimate future capacity requirements. What I observed is that, business either don’t share it or share the wrong estimations. For services whose utilization or growth trend is more or less monotonous or static, this should be fine. But web based services where growth is exponential or seasonal and depends so much on marketing/sales drives, it becomes vital important to estimate demand with a reasonable degree of accuracy.

Some of the issues can be resolved by statistically analyzing the historical workload data and forecast the numbers purely based on past trends. By sharing back these numbers with the business or other relevant teams will provide them additional data source and this way the variance between the projections and actuals can be reduced. It goes without saying that, it all could happen when business and IT work in close collaboration and not two isolated departments.

November 12, 2013

My Presentation on Oracle Solaris Zones Capacity Management

Filed under: Capacity Management,Public Appearance — neerajbhatia @ 21:03
Tags: , ,

So one more successful annual conference of All India Oracle Users Group (AIOUG) happened on November 9th and 10th at Hyderabad, India.

Earlier I used to attend Oracle database sessions at AIOUG but this time AIOUG board had done a remarkable job by extending the coverage to Solaris topics. Obviously as a Capacity Management professional this gave me an opportunity to listen what is going in Solaris world and at the same time share my knowledge around the subject matter. As part of this, I’ve presented a paper on Effective Capacity and Performance management in Solaris Zones environment.

Those who couldn’t attend it can download the paper here. I intentionally kept it bit verbose so that readers will have more details around the topics. Any questions, doubts or suggestions are welcome 🙂


February 5, 2013

Step away from the spreadsheets

Filed under: Other Discussion — neerajbhatia @ 21:50

Yesterday there was a news published on WSJ titled “Four CIA Secrets That Can Boost Your Career”

Point 3 tells an interesting story about a situation where human intelligence people made a mockery of the spreadsheets and based on which america spread the propaganda for quite long time that Iraq is sitting on a stockpiles of weapons of mass destruction (WMD).  The truth turned out to be completely opposite.

Corporate world is full of such stories where so-called “smart people”  over-rely on numbers but actually the metrics, forecasts are far away from the ground reality. Being from a capacity management field I face such scenario on a daily basis where people can’t help themselves and their complete thought process revolves around numbers. The term I would use for such people is data-aholic.

The important point to make out is one should understand the underlying meaning of the data they’re dealing with. But during my interactions with people over the recent years who claim they do technical work (for example;), but when someone peeps through the curtain, it come to know that they’ve been doing their technical work in spreadsheets.

In the times where everyone is trying to automate everything, logic being put into kernels, little room for technicians/administrators, wisdom of people are deteriorating, its not a surprise that people taking decisions purely based on numbers.

God bless us!

August 7, 2012

Green Capacity Planning Part-4: Power-Performance Benchmarks

Filed under: Capacity Management — neerajbhatia @ 07:31

Power-performance benchmarks play very important role during the design phase of a data center. At the time of new data center build or upgrade, data center designers analyze the peak capacity of IT equipments to be installed based on the SLA requirements and anticipated future business demands. Peak power usage is also determined and based on that infrastructure equipments (cooling, power delivery etc) sizing is designed. In order to gain power efficiency across the data center energy consumption of both infrastructure and IT equipments should be accessed.

While assessing the power efficiency of Infrastructure equipments is easy, it was not as easy for IT equipments as standard metrics were not available earlier.  Because power consumption of IT equipments is directly related to its utilization, power efficiency can be assessed based on its utilization and energy consumption. Now before the actual deployment it is difficult and one has to rely on vendor provided data or standard benchmarks. The problem with vendor provided energy efficiency figures is that these are often not directly comparable due to differences in workload, configuration, test environment, etc.  This way benchmarks come real handy by facilitating IT managers to compare specific models of servers and other equipments being considered for selection and thus enable them to make informed server choices and help in deploying energy-efficient data centers. Though benchmark data is based on standardized synthetic workload which may not represent your actual usage, it sufficiently serves as a proxy for a specific workload type and enables server comparisons without actually purchasing them.

After server deployment, in the operational phase of a data center, power-performance benchmarks are of little practical use. Because after server deployment it makes no sense to measure power efficiency based on standard workload. An important consideration in this phase should be to measure the power efficiency and productivity of installed IT equipment and an attempt to improve this in the future. This can be done using the standard metrics which we will consider in the next section.

SPEC Power-Performance Benchmark

SPEC is a non-profit organization that establishes, maintains and endorses standardized benchmarks to evaluate performance for the newest generation of computing systems.  Its membership comprises more than 80 leading computer hardware and software vendors, educational institutions, research organizations, and government agencies worldwide.  For more information, visit

In order to enable IT managers to make better-informed server choices and help in deploying energy-efficient data centers, SPEC started development of power and performance benchmarks. In December of 2007, SPECpower_ssj2008 was released, which was the first industry-standard SPEC benchmark that evaluates the power and performance characteristics of volume server class computers. The initial benchmark addresses the performance of server-side Java. It exercises the CPUs, caches, memory hierarchy and the scalability of shared memory processors (SMPs) as well as the implementations of the JVM (Java Virtual Machine), JIT (Just-In-Time) compiler, garbage collection, threads and some aspects of the operating system.

SPECpower_ssj2008 reports power consumption for servers at different performance levels – from 100% to idle in 10% segments over a period of time.  To compute a power-performance metric across all levels, measured transaction throughputs for each segment are added together, and then divided by the sum of the average power consumed for each segment including active idle. The result is the overall score of the SPECpower_ssj2008 benchmark and this metric is known as overall ssj_ops/watt.

Among other aspects of the benchmark results it is important to see the power used while generating the maximum performance metric and the power used during an Active-Idle period where the system is Idle and doing nothing. These are the logical best and worst cases for work done per unit of power and can serve as high and low bounds to define the power characteristics of a system. SPEC power and performance benchmarks are available for different hardware manufacturers which indicate that typically a server uses up to 40% of its maximum power when doing nothing. For example,  figure given below shows the SPEC Power and Performance benchmark results summary for Dell PowerEdge R610 (Intel Xeon X5670, 2.93 GHz). The server at 100 percent load uses 242 watts while in idle the server still uses 61.9 watts which is around 26% of the power at 100 percent load.

The role of Active-Idle in the performance per power metric is dependent on the benchmark and its associated business model. For example, if the system has typical daytime activities followed by idle nighttime periods, the role of Active-idle becomes important. In such scenarios server virtualization plays an important role where we configure multiple virtual servers on a single physical machine with the aim to minimize the idle time. Recently server vendors have started to enable servers to optionally go to sleep mode when it’s not in use.


Figure: SPECpower_ssj2008 Benchmark Results Summary for Dell Inc. PowerEdge R610 (Intel Xeon X5670, 2.93 GHz)

TPC Energy Benchmark

Transaction Processing Performance Council most commonly known as TPC is a non-profit corporation founded to define transaction processing and database benchmarks. Typically the TPC produces benchmarks that measure transaction processing and database performance in terms of how many transactions a given system and database can perform per unit of time, e.g., transactions per second or transactions per minute.

TPC release three types of benchmarks each for a different type of workload. TPC-C is an on-line transaction processing benchmark and measured in transactions per minute (tpmC). TPC-E is a new online transaction processing (OLTP) workload which simulates the OLTP workload of a brokerage firm. The TPC-E metric is given in transactions per second (tps). TPC-H is an ad-hoc, decision support benchmark. The TPC-H is a decision support benchmark and it is reported as Composite Query-per-Hour Performance Metric (QphH@Size). For more information please visit

TPC-Energy is a new TPC specification which augments the existing TPC Benchmarks with Energy Metrics developed by the TPC. The primary metric reported as defined by TPC-Energy is in the form of “Watts per performance” where the performance units are particular to each TPC Benchmark. For example in case of TPC-E benchmark the metric would be Watts/tpsE.

Following table shows TPC-E energy benchmarks available at the time of writing this post.

In addition to watt-per-performance metric TPC benchmark also provides energy secondary metrics corresponding to the energy consumption for each of the subsystems.

The Idle Power is also reported which defines the amount of energy consumption of a reported energy configuration (REC) in a state ready to accept work.  This is important in scenarios where systems that have idle periods but need to respond to a request (can’t be turn off). This is reported in watts and calculated as the energy consumption in watt-seconds divided by the idle measurement period in seconds.

Figure given below shows energy secondary metrics for Fujitsu PRIMERGY RX300 S6 12×2.5 benchmark (first benchmark result in the table mentioned above). Apart from watt per tpsE at subsystem level it also includes energy consumption at subsystem level at both full load and idle load levels.

June 26, 2012

Poll Results and upcoming Oracle Event in Pune

Filed under: Public Appearance — neerajbhatia @ 23:24

Last month I have posted a poll on my blog and LinkedIn group Oracle Database Performance Tuning about suggestions for a topic for webcast. I am enraptured by the response I got and as you can see below majority of the people have voted in favor of presentation title “Database Capacity Planning how to start?”.

This is inline to my expectations. Last year I have delivered a presentation at AIOUG’s annual meet Sangam11 on the topic “Day Someone Say the word Capacity?”. It was about the basics and concepts related to Capacity planning in general with some specific examples related to Oracle database. Interested people can download the presentation at AIOUG website or Papers and Presentations section of my blog. Since then I have been receiving emails asking how can one start doing Capacity plan for their Oracle databases. The curiosity is also visible in the poll results.

Before organizing a webcast of people’s favorite topic “Database Capacity Planning how to start?”, I am presenting it on upcoming 1-day Event in Pune called “TechDay @ Pune”. It will be on 28th July 2012 at Pune. Details are available here TechDay @Pune.

After this presentation soon I will plan for a webcast on the same topic. Looking forward to meet you on 28th July in case you live in or near to Pune otherwise we will chat during the webcast.

Bye for now 🙂



May 17, 2012

Poll: Topic for upcoming Oracle webcast

Filed under: Public Appearance — neerajbhatia @ 23:47
Tags: ,

I am collecting feedback about which topic you like to see for upcoming webcast. I am planning to conduct this sometime next month (June).

Any other feedback you like to share – please feel free to do that in the comments section …



May 1, 2012

Green Capacity Planning Part-3: Monitoring & Measurement

Filed under: Capacity Management — neerajbhatia @ 17:53

In last posts (part-1 and part-2) we have discussed background and driving forces for Green Capacity Planning. Today we will discuss the monitoring aspects of it. Monitoring is very important for any capacity planning process and Green capacity planning is not an exception. For the sake of management and easiness the post is further divided into two parts: monitoring basics and major monitoring tools available in the market.

Monitoring Basics

IT equipments were rarely monitored for their energy consumption. The efficiency of system administrators and other IT people is decided by the availability and performance of IT Infrastructure. They are responsible for meeting the performance SLAs and availability around 99.99% in complex 24 X 7 environments. That is the reason that monitoring focus is mainly on these aspects of the Infrastructure and most of the metrics being captured by native utilities or third-party tools fall into availability or performance categories.

On the other hand facilities management people are responsible for energy, cooling, lighting and other aspects of a data center. It’s their responsibility to ensure that sufficient supporting infrastructure is always available to support the data center. As we have seen that power consumption is linearly related to device’s utilization there need to be synergy between IT management and facilities disciplines. However that is not the case. IT capacity planners analyze the impact of business demand on underlying infrastructure and forecast the capacity requirements. The impact of forecast on the supporting Infrastructure is not in the scope of their work. On the other hand, facilities teams usually don’t consider the impact of business demand on the supporting infrastructure. To summarize both disciplines work in isolation and rarely feed information to each other and this results in situations where you run out of power and data center migration becomes the need. Other than financial implications this impacts business services and unnecessary overhead which could have been avoided.

To overcome this situation a holistic approach is required where we take inputs from facilities management and IT management and come up with complete picture of data center infrastructure usage and impact of business demand. Peter Drucker rightly said “If you can’t measure it, you can’t manage it”. This is very true for Capacity Management process.  To effectively manage a data center IT managers should be able to see what is happening on both the IT and Infrastructure sides.

It makes more sense for new-generation server hardware which has significantly improved over the years where power consumption of a server is dynamic and depends on the workload it carries out. This is good in terms of power efficiency but anticipating the data center’s energy requirements has become challenging. Also with the ever increasing cost of energy, the operating cost of these components are significant comparing to the total operating cost of a data center. According to Gartner report published on March’10, energy savings from well-managed data centers can reduce operating expenses by as much as 20%.

There are broadly two ways to measure the power consumption:


The Conventional Way

Conventionally IT managers base their energy planning on fundamentally flawed power calculations: vendor faceplate power specifications or the de-rating of these specifications. Both lead to an inaccurate energy requirements.

Historically server power benchmarks were not available and thus the only way for initial data center power planning was to rely on power data provided by system vendors in the form of faceplate values. But the use of faceplate value is flawed at the first place as it indicates the maximum power requirements for each component irrespective of its configuration or utilization. But the power consumption of a system is linearly correlated with its utilization. Because of this, a huge gap exists between the data center’s anticipated power require­ment and the actual power required by its equipment. Another option which is used is fixed de-rating where an arbitrarily percentage or number is subtracted from the nameplate value considering that the system’s faceplate rating is higher than its actual use. For example a 1,000 watt rated server would be de-rated by a fixed 20 percent which means you are assuming that it would consume 800 watts. However this is not true as its power consumption would be dependent on the utilization and estimated value most often is grossly inflated. As you might think, finding the correct percentage by which de-rating should be done is nearly impossible without any measurement tool. Two servers of the same manufacturer and model can consume different powers because of the utilization.

Figure given below depicts power usage of IBM x3630 M3 system and its relation with server CPU utilization. The red line shows constant power usage at 675 watts as per the server faceplate value. The Spec benchmark data reveal different story where even 100% CPU utilization the maximum power draw of the server is 259 watts.

Now if we blindly use 675 watts as basis of our data center’s energy requirements it will result in huge unused energy.  Other than financial ramifications there is a huge risk of replacing or building a new data center assuming that your existing data center has reached out of gas when the fact is it has lots of unused power available. Over-provisioning of power not only increases operational expenditure, but leads to unnecessarily high capital expenditure (Capex).

An Intelligent Way

Considering an important fact that server power consumption is dynamic we need to have a sophisticated way to measure actual power draw of a server based on the configuration and utilization. This is where DCIM tools play an important role. DCIM which is commonly known as Data center infrastructure management provides performance and utilization data for IT assets and physical infrastructure throughout the data center. The data collected at low-level infrastructure level aid domain experts (e.g., capacity planners and facilities planners), to conduct intelligent analysis. According to Gartner report “DCIM: Going Beyond IT” published in March 2010, DCIM tools are expected to grow to 60% in 2014 from 1% market penetration in 2010. DCIM doesn’t replace systems performance management or facilities management systems however it takes facets of each and apply them to data center infrastructures. It drives performance throughout the data center by monitoring and collection of low-level infrastructure data to enable intelligent analysis by IT capacity planners and facilities planners and thus enable holistic analysis of the overall infrastructure.

Now based on the technology it uses to collect the data we can categorize DCIM tools into hardware-based and software-based. In hardware-based approach power meters or sensors are installed with every device which measure the power usage and send the data to the centralized server. However hardware-based solutions are intrusive, expensive and time consuming to install the device in a large complex data centers. Software-based solutions on the other hand, are also available which monitors the device over the network through Simple Network Management Protocol (SNMP) protocol.

DCIM vendors are emerging very fast and in recent two years vendor market has become crowded. Also existing vendors are integrating their products to offer a common tool for data center management. By capturing power consumption data at the device level data center managers can gain a more-detailed view of their data centers and thus make informed decisions about equipment placement, cooling efficiency, power consumption and upgrades, and capacity planning. Predictive modeling is also an important component of these tools which provided cost effective and accurate solution for many designed data centers.

That’s it for now. In the next post we will further dive into monitoring aspects and discuss major market players and pros and cons of them.

March 14, 2012

Green Capacity Planning Part-2: Driving forces

Filed under: Capacity Management — neerajbhatia @ 23:32
Tags: ,

In my last blog post we have discussed about the background of Green Capacity Planning and what it is about. We have briefly touched various regulatory authorities which are actively working to promote Green IT practices and to lay down guidelines to measure, report and improve the energy efficiency of a data center. We call them driving forces and today we will discuss about these forces and their work.


Let’s start with US Environmental Protection Agency (EPA) which was established in 1970 to consolidate in one agency a variety of federal research, monitoring, standard-setting and enforcement activities to ensure environmental protection. Along with other initiatives, ENERGY STAR is most popular and successful program carried out by EPA and the U.S. Department of Energy (DOE) jointly to promote energy efficient products and practices and thus helping us save money and protect the environment. You must have seen a ENERGY STAR rating while buying an electronic product. Earning ENERGY STAR certification means these products meet energy efficiency guidelines set by the EPA and DOE. With ENERGY STAR and other initiatives like Environmentally Preferable Purchasing (EPP), EPA is helping businesses to buy Green IT products. It enables green vendors, businesses and consumers to evaluate information about green products and services and calculate the costs and benefits of their choices.


The European Environment Agency (EEA) is an agency of the European Union. With currently 32 member countries its goal is to help in developing, adopting, implementing and evaluating environmental policy. European Union European Environment Agency (EEA) and US EPA has released code of conduct for data centers to reduce the energy consumption in a cost-effective manner without hampering the functions of data centers. These codes of conduct give guidelines to constantly measure power usage effectiveness (PUE) and to attain an average PUE of 2.0 (more details about PUE will be discussed in a later blog post). Similarly organizations are advised to report and put efforts to reduce carbon emission levels.

The Green Grid

The Green Grid is a non-profit, open industry consortium of end-users, policy-makers, technology providers, facility architects, and utility companies collaborating to improve the resource efficiency of data centers. With more than 175 member companies around the world, The Green Grid seeks to unite global industry efforts, create a common set of metrics, and develop technical resources and educational tools to further its goals.

The Green Grid was formed in February, 2007 with headquarter in Oregon. Currently the board has following members: AMD, Dell, EMC, Emerson Network Power, HP, IBM, Intel, Microsoft, Oracle, Schneider Electric, and Symantec.

The Green Grid proposed several metrics to report and increase the efficiency of a data center. As discussion about these metrics is self-contained in it’s own we will discuss them in a later blog post.


Leadership in Energy and Environmental Design (LEED) is basically a third-party certification program. It is responsible for design, operation and construction of high performance green buildings. This ensures the buildings are environmentally compatible, provide a healthy work environment and are profitable. Developed by the U.S. Green Building Council (USGBC) LEED is intended to provide building owners and operators a concise framework for identifying and implementing practical and measurable green building design, construction, operations and maintenance solutions. LEED is not specific to only data centers but all buildings.

LEED New Construction buildings are awarded points for sustainability for things like energy-efficient lighting, low-flow plumbing fixtures and collection of water to name a few. Recycled construction materials and energy efficient appliances also impact the point rating system.

That’s it for now. This lays the foundation for most interesting part of the process which is monitoring and measurement where we will discuss various techniques to measure power utilization data.



Link to part-1: Green Capacity Planning: Background & Concepts

February 27, 2012

Green Capacity Planning Part-1: Background & Concepts

Filed under: Capacity Management — neerajbhatia @ 19:17

Last June I started writing a technical paper on Green Capacity Planning. I felt satisfied with my work and was able to cover the topic in a 30-pages document. Then unfortunately the hard disk of my laptop met the failure and I lost all the work. This costs me more than that. Situation was like attending the same course twice and I couldn’t start it until recently.

In these 6-8 months the awareness about Green Capacity Planning has improved a lot and everyone is talking about it. Some questions from my professional network triggered me to once again start writing about it. I assume (forgive me If I am wrong!) that for 90% of the readers it is a new road to travel and those are my very target audience. As the topic is comparatively stale now, which means some of you already knew it; still I want you to assure that you will get something out of it. The reason for blog posts instead of a paper is that backups are automatically taken and who knows that at the end I will release a paper with more details. Another advantage is that you don’t have to wait for fully-fledged paper to be released.

So let’s step up a gear and discuss what is Green Capacity Planning. With the increasing electricity prices and tougher business conditions businesses are scrutinizing the power consumption and other operational expenses too closely and IT people have started concerning about the inflationary spiral operational expenditures. In the recent years a new market and practices have emerged which consider energy, cooling, space etc aspects during capacity review process for a data center. Because these practices are related to a much bigger “Green IT” initiative, it is commonly known as Green Capacity Planning or Intelligent Capacity Planning.

Why Green Capacity Planning?

Frankly speaking eye-popping electricity prices and increasing operational costs of a data center are the major driving forces pushing towards adopting Green Capacity Planning discipline. Today’s data center infrastructure is beyond the traditional IT equipments which include servers, network devices, and storage sub-systems. Now it also includes cooling systems, uninterrupted power supplies (UPS), lighting etc. The operational cost of these equipments is a significant part of total data center operational expenditures (Opex).

In the recent years due to the tough economic conditions there is an increasing pressure on IT teams to implement cost cutting measures. IT teams are already cashing in on the technologies like virtualization, server consolidation, and cloud computing. The awareness about green initiatives had given data center managers a food for thought, a way to go beyond the traditional Infrastructure performance management, to further cut their operational expenditures and thus improve the efficiency. As the data center managers continue to be challenged by the business to increase the efficiency in a cost-effective way, companies are beginning to realize that being “green” isn’t just good from a PR perspective; it can also make good financial sense.

Technology advancement is another reason for the need to go green. CPU in data center servers has been truly sophisticated in terms of power distribution. It dynamically switches to low-power mode and turn off cores in the system when there is low volume of work at hand. Despite of these improvements supporting infrastructure is still the same which had been made for older servers without dynamic power. As a result anticipating the power requirements for a data center has become challenging and results in either overloading or over-provisioning the electrical infrastructure.

Apart from cost and technology improvement there is an increasing pressure from regulatory authorities. Various agencies like US EPA (Environment Protection Agency), DOE (Department of Energy), EEA (European Environment Agency), LEED (Leadership in Energy and Environmental Design), The Green Grid, IEA (International Energy Agency) have been constantly working to encourage organizations to use sustainable energy. Similarly organizations are advised to report and put efforts to reduce carbon emission levels. It is not surprising that many organizations have already started measuring and reporting these metrics under their Corporate Social Responsibility (CSR) programs. After 2011 Japan’s devastating nuclear disaster world is looking for alternative green energy resources such as wind power, solar power and geothermal power. Governments are encouraging companies to source environmentally friendly electricity by means of tax reliefs, recognition. After climate change conference held in Durban in Dec 2011 world seems to agree on legal-bounded deal to limit carbon emissions. It means there would be more strict laws to reduce the carbon emissions by data centers.

Green Capacity Planning (What it is about?)

Typically IT Capacity planning includes collection of relevant workload and resource utilization metrics for IT components and analyzing it with the business demand to see its impact on the IT components and give business a view of when capacity upgrade/downgrade is required and most cost-effective way  to achieve this without affecting the agreed SLAs.

The scope of traditional capacity planning includes:

  • Computer power or CPU
  • Memory- physical and secondary
  • I/O
  • Network
  • Space: Internal and external like SAN

On the other hand a parallel stream of specialized people take care of power, lighting and cooling aspects of a data center and they are known as facilities management or building management. However being IT as their customers they need to know the IT demand and impact of any changes in IT Infrastructure and demand on the facilities infrastructure. This missing link results in under-provisioning or over-provisioning of facilities infrastructure, poor time to market of IT services because of slow change management process.

Green Capacity Planning is a coordination effort of IT and facility teams to enable informed decisions about data center capacity and is a natural extension of IT Capacity Planning. This synergy brings optimal infrastructure sizing, cost-savings, and effort to save our natural resource – energy. The aim of Green Capacity Planning is to extend the scope of traditional capacity planning practice and include power consumption of the individual IT equipment and overall energy usage profile of the site. It also includes carbon emission reporting and analyzing cascaded effects of any IT capacity upgradation on underlying Infrastructure. Because under newer approach the scope is wider and more intelligent thus it is also known as Intelligent Capacity Planning and specialized professionals skilled to apply these practices are referred to as Intelligent Capacity Planners.

The benefits of Green Capacity Planning lie in the collection of power consumption metrics. Facilities management is solely responsible for collection of power consumption data at individual component level and data center as a whole using specialized DCIM (Data Center Infrastructure Management) tool or using power strips. Together with configuration items data (through Configuration Management database CMDB) and resource management data, workload data, performance data (through Performance Management database PMDB and/or monitoring tools) and business demand data (from business); IT Capacity Planner would be able to forecast future capacity requirements in terms of IT infrastructure and energy. Having understood all facets of a data center, it will become possible to perform predictive analysis. For example for a new project Intelligent Capacity Planner will be able to predict how much IT Infrastructure, energy and cooling would be required. Also periodic capacity reports including energy consumption, energy efficiency metrics, and carbon emission will enable IT and facilities teams to monitor their efficiency and feed back the data to external regulatory authorities.


So in this post we have discussed the concept of Green Capacity Planning, background and it’s evolution. That lays the foundation for the various other aspects of the topic. In the next post we will discuss various authorities working towards energy efficient data centers.

Next Page »

Blog at