Super Bowl: The Big Game and Checklists

Every once in a while there’s an event of such importance and magnitude that it requires laser-like focus and flawless execution to ensure a positive outcome. For Go Daddy, the Super Bowl is one such event.

So what does it take to ensure Go Daddy is ready when the proverbial curtain goes up and the show begins? How do we ensure that everything goes perfectly from a technological and process point of view and we can absorb a huge surge of traffic and activity to GoDaddy.com?

The simple answer is planning and lots of it. At Go Daddy, we’ve had the good fortune of organizing a number of Super Bowl-level events. Building on our past experiences and lessons learned, we have the process down to a near science. I would say our most important tool is not technology in this case, but the humblest of organizational tools… the Checklist.

Pre-Game Checklist

During the events leading up to the Big Game, we go through a massive checklist to ensure that we are ready on a process and technological front. Below is an example of a type of checklist we use in preparing for any big event.

We run these pre-flight checklists on a continual basis, all the way up to the minute the Super Bowl starts to ensure that our infrastructure is as solid as possible and that we are not forgetting even the littlest detail.

Network Infrastructure Preparation

First and foremost, we simply make sure that sufficient network infrastructure is in place to support the expected traffic loads. Having aired commercials during the big game in 8 previous years of the Super Bowl, we have enhanced our infrastructure each year as needed, but the major components and expectations have become standard practice for us.

Usually, starting around 4 months prior to the Super Bowl, the network team starts the process of verifying the existing network infrastructure and looking at any major enhancements planned for the next year. We work with other business teams such as our marketing department and IT teams (server, storage, and database teams) to understand which web servers and other systems will be needed to support the upcoming advertising campaign.

The network team starts by identifying the network paths associated with the systems that are involved, and either confirms that they match known paths from previous years or determines that they represent additional network infrastructure. We then update our project documentation accordingly.

Check the Checklist!

As you can see from our sample above, we follow checklists that identify the known network elements that need to be reviewed. With the equipment identified in the checklists, we review all of our routers, switches, and the links that connect them within the relevant network infrastructure for capacity and redundancy. We verify that all of the network links are up and running as expected and without errors, and we review the utilization of those links to ensure that we have sufficient available capacity. Also, we check other network infrastructure such as firewalls and load balancers to verify that they are configured as expected and that no errors are seen. Armed with information about any possible new infrastructure needs, or any capacity issues or other hardware errors, we proceed to upgrade and correct any issues.

As we begin planning for the final network infrastructure optimizations that are to be deployed just a few days prior to the event, we also have configurations prepared to allow us to rapidly address many possible failure scenarios. We make our final configuration changes to the network and leverage multiple Internet transit links dedicated to supporting our website traffic as well as efficient global Internet peering connectivity to maximize our available capacity. Our contingency plans have us ready to recover failed links, route around possible hardware failures, and even shift the entire website infrastructure over to another redundant data center with all of the same capabilities. We check and double check all items on the checklist.

The day of the ‘Big Game’

We take enormous pride in our operational execution during game day. While the world is watching the Super Bowl and our entertaining commercials, our IT operations staff is focused on checklists, operational KPIs, and fault management telemetry. We collect and analyze over 4.5 million data points per hour, monitor the health and well being of the GoDaddy.com site over 180 times an hour, and respond to any trouble with our infrastructure within minutes.

As a commercial airs, there is not a single person in our IT Operations Center who is not focused on a KPI or technical health indicator. We monitor the traffic levels and connection limits of all the key infrastructure items. After the campaign, we roll back the temporary changes to the network infrastructure, review and report on the traffic levels seen by our network, and then follow up with a review of lessons learned.

After the Game

It’s been several years since any of our IT Operations Center staff has actually watched the football game. For us, the Big Game is about making sure that everything goes according to the checklist and that we are prepared for any contingency. Our goal is to ensure that we perform up to our high expectations and the world gets a chance to watch our entertaining commercials and perhaps check out the awesome services we offer. We get a heck of a rush from everything going by the numbers and celebrate our success after the game is over with a few beers and high-fives. After all, we need to have a bit of fun!

More on Checklists?

Do the checklists stop after the game? No way! We continue to run these checklists all the time to ensure operational readiness and that we are not forgetting the important details on a daily basis.

Want to know more about how checklists can improve your operations? I would highly recommend that anyone who is an operational leader read “The Checklist Manifesto” by Atul Gawande for insights into how checklists can help perfect your processes.

Many operational leaders at Go Daddy are huge proponents of using checklists to ensure we have thought-out processes and procedures and that the simple things are not forgotten. Sometimes, the magic is as simple as remembering to do the right things at the right time.

*Michael Racki and Matt Hubbard, contributing authors.

Since joining Go Daddy in 2008 as a Senior Network Engineer, Michael has helped build and support a number of IT Service Management initiatives. Michael leads the Go Daddy Information Technology Operations Center. His teams are accountable for Event, Incident, and Problem Management activities that exist to support the availability of Go Daddy’s 30,000+ systems.

Got something to say? Go for it!

 
Traffic Log Image