Latency – Going Beyond Throughput
posted on January 17, 2016 by Amit Golander
“We do these things not because they are easy, but because they are hard.”
John F. Kennedy
Storage is moving to flash, and flash is getting faster, so people keep asking me why I keep talking about latency as if there is a problem. Isn’t faster flash going to just make everything faster? Won’t “the rising tide lift all boats”?
Flash media as a storage media is indeed “faster” than the spinning hard-disks we’ve all been using for decades. But when it is used to simulate a hard disk, as is the case with SSD products, there are software layers which prevent it from reaching its full potential. That explanation always gets heads nodding, because it is obvious. But what about when the flash media is not simply packaged into an “SSD” and connected over SATA or SAS, but instead can be addressed via NVMe over PCIe? Doesn’t that make the problem of hard disk drive emulation go away?
Not entirely. For one thing, in some cases the SSD abstraction is maintained despite being connected via PCIe. Faster than SAS or SATA, but still those software speed bumps to keep data from driving too fast through the storage parking lot.
But let’s suppose that you have a flash card designed specifically for NVMe and allowing more sophisticated memory-addressing software mechanisms to unleash its greater potential. And what about the SCM (Storage Class Memory) products coming to market blurring the distinction between DRAM and non-volatile media previously relegated to the storage layer? Hasn’t hardware solved the performance problem?
I wish it were so. But it comes back to the subtle distinction between latency and throughput, and what it means to the software inside the kernel.
At a high level, it’s easy to think of latency as the inverse of throughput (and vice versa). For a simple, single-threaded series of operations, that should be literally true. But it is more complicated when you have many operations occurring in parallel.
Consider the analogy of a highway. If it is a one-lane road. Latency is one vehicle making a round-trip between two end points. Let’s imagine a 4-passenger car traveling on a single-lane road for 50 kilometers across a desert between two depots. Each roundtrip is an event and its completion time is its latency. If we want to improve its latency we could make the car faster.
If we want to improve its throughput we could make the car bigger – replace it with a huge, slow bus – but the net result would be worse latency with more passenger roundtrips for greater throughput. We could further increase throughput by building more lanes on the highway and running more vehicles. Optimally, we would make all the vehicles faster, whether they became even larger trains of trailers pulled by a truck or swarms of speedy motorcycles spreading out over the ever-multiplying number of new lanes.
The point here is that we really have three performance dimensions to consider:
- How quick is the roundtrip for any one passenger? That is latency.
- How many passengers in aggregate per unit time? That is throughput.
- How many independent events (vehicles)? That is accesses.
Enabling more accesses will be a natural consequence of lower latency, because the number of lanes is fixed (in the analogy) and the number of queues is finite (applying the analogy to software).
To optimize the use of emerging hardware technology we shouldn’t merely rely on building more lanes in the highway (increasing throughput potential via flash media capability). We should also be making the vehicles faster (improve latency). Enhancing the hardware is in the hands of Intel, Micron, Samsung, and all the rest of the players in that space.
Simply making use of the bigger/faster/cheaper nonvolatile “flash” hardware components coming to market in wave after wave of impressive innovation is straightforward. Everybody in the storage industry is doing it. And adding more of it is like adding lanes to the highway to get more throughput.
But doing something meaningful about latency is not easy. It’s hard. It means rethinking the fundamentals, changing the innards of the kernel, ripping out cruft with both hands and designing new streamlined code to handle storage I/O for the 21st century. We’re doing it at Plexistor.