In our most recent Super Bowl commercial, we featured a QR Code® in the corner of the screen—a first for any Super Bowl advertiser. If you happened to scan that code, you may have noticed that you were taken to x.co/gdcloud.
X.CO is Go Daddy’s URL shortener. It’s a service that I developed and currently manage—I’m as proud of it as can be. When I found out that it was going to be in our Super Bowl ad, I was simultaneously elated and nervous! The Big Game brings big traffic.
Luckily, we had thoroughly prepared X.CO for the previous year’s Super Bowl for an initiative that didn’t come to pass. That meant extensive load testing, performance tweaking, dramatic simplification, and cluster expansion. In the interim, though, we’d also added a lot of features. So, it was worth a load test revisit.
Days before the big event, we hammered the production application at a sustained rate of approximately 1,800 requests per second. Although the barrage really put the servers through their paces, everything held up fine and there was no customer impact.
How did we do this? First, X.CO runs ASP.NET 4.0 on Windows Server 2008 with a SQL Server database backing and Memcached for its caching layer. Originally, the application was all served from a single Web site (redirection requests were handled by the same code that did shortening and offered the API). This meant that we couldn’t optimize the site for any one of the functions.
By separating the redirection functionality from the others and putting each on its own cluster, we could greatly simplify the architecture for its specific purpose. The redirection servers now had only two features to worry about: get the shortened URL being requested out of Memcached and put the request into a message queue for later processing.
This separation also meant that we had a lot of spare computing power on the redirection cluster. We used .NET 4.0′s Task Parallel Library (TPL) to make the message-processing Windows service take advantage of that idleness. It uses a variable-sized batch to fly through the MSMQ to log clicks and scans.
The real wins came from optimizing Memcached. In our load test lab, designed to simulate production as closely as possible, we ran a series of tests in which we tweaked one Memcached setting at a time and then analyzed the results.
The biggest surprise came from switching the .NET client library. We were using BeIT Memcached and it would do pretty well up to a certain point. Once it hit that point, it would just go kaput. We found that BeIT Memcached’s socket pooling had a problem at load. Collisions started flying, causing serious exceptions, until it all came to a screeching halt. Turning off socket pooling entirely helped matters. However, it too, eventually fell down. You can see it clearly in the load test below.
On a whim, we swapped out BeIT Memcached for the other major .NET Memcached client library, Enyim Memcached. Immediately the test ran through to the scheduled end without error or incident. Its socket pooling is much more stable and reliable.
It was definitely the single best change we made. However, there were others. Using the -t argument, we changed the number of threads to 32 from the default of 4. While this goes against the recommendation of the maintainers, during load testing we found that 4 threads per CPU core increased our throughput substantially. Another optimization was the use of the binary protocol over the text default. This reduces the network I/O, which is important at load.
The result of these small tweaks yielded great results, as you can see:
Memcached is a great way to speed up a Web application by significantly reducing the quantity of database requests in a high-performing fashion. The proof is in the Super Bowl pudding!

