Content on this page requires a newer version of Adobe Flash Player.

Get Adobe Flash player

Is your infrastructure ready for the holiday crunch?


September 2011
October is critical for Ops leaders. During the next six weeks, you need to take a hard look at your infrastructure and make sure you’re ready for the holiday season.
Nothing puts pressure on your infrastructure like peak periods. And the holidays can create wild surges in demand. Global shipping giant UPS, for instance, sees its workload spike by 67 percent during the holiday season, with daily package volume climbing from more than 15 million to a peak of 25 million.
Failing to prepare for peak demand can earn your business a spot in the web crash hall of shame. Victoria’s Secret is still infamous for its 1999 crash, when its Super Bowl ad drove millions of viewers to its underprepared website. More recently, Verizon underestimated the demand for web pre-orders of the iPhone on the first day consumers had a choice other than AT&T. Even Amazon, with all the power of EC2 at its disposal, was brought low by Lady Gaga’s “Born This Way” on a $0.99 offer.
Obviously, it doesn’t take a holiday-season surge to crash your service. But for many companies, holidays are when stakes are highest. According to estimates by the National Retail Federation (NRF), the world's largest retail trade association, the holiday season is responsible for 25 percent to 40 percent of annual sales for some retailers.
Over the next six weeks, you’ve got a window to prepare for peak demand and test your systems while there’s still time to fix issues. Here’s how.
Making tough decisions
Whatever you did last holiday season, you can expect this year to be different. Business and infrastructure changes during the year mean that you’re not the same organization you were 10 or 12 months ago. You’ve got to assess where you are right now to put your holiday plan in place.
Many companies institute a change freeze for the holiday season, during which noncritical work is put on hold. Generally, this is not a good time to change software patches or update network routers.
But in reality the business is likely to force some unforeseen change to your environment. Perhaps it’s a change in your storefront content or an update to an application that’s critical to holiday season success. Whatever the reason, lines of business can trigger a move even at the height of the rush.
Other changes may come from forces outside the company. Your vendors may introduce changes in hardware or software in response to problems they’ve encountered.
In these cases, the key to minimizing risk is making sure that you have an incident management process in place, ideally one that’s backed up and tied to a change management process and linked in feature and function.
Key elements are: An incident monitoring system—covering both applications and infrastructure—that enables proactive monitoring of your environment A game plan for incident response A testing plan that features regression or known-state testing The ability to track the state of your environment and detect any unauthorized changes
Monitor and measure
On some level, issues are inevitable. So how do you catch them before your customers do? Incident monitoring that measures the state of your environment from the customer’s perspective is one of the keys to making it through the holiday crunch. Use a solution that allows end-user monitoring, which emulates a customer’s behavior and allows you to catch deviations to the expected experience. This gives you the ability to monitor applications and infrastructure and understand whether the business application is functioning correctly. If not, focus on the area, logical or physical, which may be at fault.
Another common mistake is to make infrastructure changes or roll out new servers only to forget to tie them back to the monitoring system. Thus, key parts of your infrastructure are hidden.
Cloud computing introduces another wrinkle. Just because you can expand elastically into the cloud to add capacity doesn’t mean you’ve got visibility into how those servers are behaving. Nor does it mean you’re protected from catastrophic failure: Just ask anyone affected by the April 2011 crash of Amazon Web Services. In some cases, AWS customers were down for days. Now imagine this happening a week before Christmas.
Your goals are clear visibility into your systems and having tools in place to measure performance. During this time, make sure you are measuring KPIs, looking at percentages for: reopened incidents escalated incidents urgent changes outages due to changes unauthorized implemented changes SLA coverage
Respond
Your monitoring system is feeding you information about your infrastructure. Now you need to set up a crisp response sequence and have clear escalation paths for when something does go wrong.
Maybe someone at a remote office rolls out an unauthorized change. Thanks to your incident monitoring, which is tracking the authorized state versus any unauthorized state, you’re aware of it. But are you able to roll it back, and how quickly can you return to the previous state?
Especially during the holidays, it may be hard to track down key personnel. It’s essential to have a response plan with at least these elements: A list of subject matter experts (SMEs) for incident response and handling A "follow the sun" model for handing off work (i.e., 24-hour capability) Embedded knowledge in the incident management system for handling common problems.
Think through a variety of worst-case scenarios. For instance, if your SME is in Seattle and the incident is offshore or on another coast, how do you engage this person in minutes?
Test
The final part of your holiday preparation should be testing. To minimize risk, it’s essential to test aggressively to see how a new capability behaves. Testing a change against the known state of your environment is critical to making sure you’re covered. Just because a change works on the surface doesn’t mean it hasn’t introduced an effect somewhere else in your environment. Regression testing processes and tools can help you deal with an unanticipated move you may be forced to make.
One common mistake is not to push the envelope enough when it comes to testing. Make sure to test so that you’re covered for an unpredictable market response. A good rule of thumb is to test for an order of magnitude more than what is expected. So if the business expectation is 1,000 orders, test for 10,000. Test for peak traffic and ensure quality of service during the rush by monitoring the entire service from apps to infrastructure.
Life is certainly easier when you can simply freeze your infrastructure for the holidays. But business needs don’t take time off. When you’ve got plans for monitoring, responding and testing, you’re able to meet those needs, minimize risk and react with agility.
For more information about applications and infrastructure monitoring and how you can ensure the performance and availability of virtualized and cloud services, find out about HP Business Service Management