I have written variants of this article before… I find it is such a recurring topic that maybe it is worth a revisit once again.
Back in the day, bumming instructions out of your assembly code was the thing to do to gain a few more CPU cycles here and there. It was very time consuming work but computers were expensive and very, very slow so the performance gains were worth it. It was great fun, but it’s hasn’t been cost effective for a couple decades now.
In the 90’s, software performance became a forgotten art for the most part. With the MHz (later, GHz) wars in full swing, it was a given that CPUs would be twice as fast by the time you released the code, so why bother with any performance optimizations! As long as it was adequate on your development box, it would be plenty fast later.
In the 2000’s the CPU frequency race was slowing down and Internet scale was speeding up. Up to a point you could buy bigger servers to keep up but that was quite expensive and only got you so far. No matter how much budget you had, at some point faster servers were not going to cut it anymore so you had to scale sideways instead. And thus, the obvious conclusion was to skip the expensive server part altogether and scale horizontally on cheaper hardware from day one. Remember the buzz in the earlier part of the decade about google having 10K servers? (Seems like such a small number now!)
It became a point of pride to have as many servers as possible and, once again, improving code performance was not seen as a good use of time when you can always throw another cheap box (or another hundred, who’s counting?) at the problem to compensate.
There’s nothing to argue with the basic premises of these trends. It was true that CPUs were getting faster all the time and it is true that scaling horizontally on commodity hardware is the way to go. And it is also very true that intensive code optimization is hardly ever worth the effort and opportunity cost of not doing something else.
(Back in Sun in the Web Server team we did spend a fair amount of time on such intense optimization work, looking for a few percent here and a few percent there. The goal was to be able to post world record SPECweb numbers (one example here). While fun, it was an exercise driven by marketing not so much the needs of data centers. For most platform vendors, such an effort is not worth the cost. For companies offering services as opposed to products, it’s basically never worth the cost.)
The end result of all this, however, is that the concept of writing faster code and architecting for performance seems to have been lost! I’ve been seeing this for years now and if anything the trend is becoming more prevalent. The idea of scaling with more boxes from day one is so ingrained that I rarely see teams doing some basic performance sanity checking first.
More servers do cost more money. Not just to buy, but particularly to run, cool and house in a rack. If you can get by with a few less servers, that’s not a bad thing. If you can get by with a lot fewer servers, all those operating costs go straight to your profit margin. Not a bad deal!
If you read the popular book Art of Scalability from a few years back you’ll be familiar with their three axis of scalability (more boxes horizontally, split by service, shard by customer). I note with amusement they forgot the easiest and cheapest axis of scalability, which is to write more performant code in the first place…
The usual argument goes that efficient design and code is not worth it because the gains, while real, are small enough that they are lost in the noise and you’ll still need about the same number of boxes anyway so why bother? That’s usually true IF you’re starting from a reasonably optimized design and implementation. However, if the development team has not been running realistic load testing and performance analysis all along, I can pretty much guarantee there are gains to be had that’ll save you quite a bit in operating costs.
Enough philosophizing, how about a real world example…
When I started at my current position I took over one of the core production REST services. It was (and is) a very standard setup… REST APIs, Java Servlets, JAX-RS, MySQL. The usual. Response times were plenty adequate although not stellar. About a year ago response times started climbing as our user base keeps growing every month. While it was still doing fine, comparing the usage growth curves to the response times curves made it clear it was time to order some more hardware soon to spread the load a bit before it slowed down enough that customers would notice.
Meanwhile though, I had been working on sanitizing the performance. Long story short, I never did order more hardware. Just the opposite… after I upgraded the code, about 50% of the hardware dedicated to this service became available to reassign to other things, it simply wasn’t needed anymore given the increased capacity of the new code.
The new code can handle just about 40 times more throughput per server (not 40 percent, 40 times!). When fully loaded (at max capacity) it now maintains mean service times in the 8ms to 10ms range. The previous version had mean response times in the 100ms to 150ms range even though it was handling less than 1/40th of the load!
These may be commodity boxes, but buying 40 of them still takes some cash. And the monthly operating cost of 40 boxes is real money as well. Think about it, it means that roughly a rack full of 1U servers can be downsized to a single server…
I’d love to be able to boast about having done some extreme performance magic to get these scalability benefits, but the reality is all I did was some basic design and implementation optimizations across the board, grabbing the low-hanging performance gains here and there. Such gains add up and so by the time I was done the system could handle 40 times as much traffic (customer requests handled per second).
Why wouldn’t you do this level of performance sanity checking? It doesn’t take that much extra work to design and implement for scalability and the end result is a competitive advantage.