Re: Latency spikes with 'thread' option

Jens Axboe <axboe@xxxxxxxxx> · Thu, 13 Dec 2012 14:19:06 +0100

On 2012-12-12 21:11, Sam Bradshaw (sbradshaw) wrote:
> Hi All,
> 
> We're running queue depth sweeps with a 4k random read workload (sample config 
> below) against a high performance PCIe SSD - the Micron p320h.  We're seeing
> latency spikes to 1 sec when the 'thread' option is used.  Instrumenting the
> driver, we see max latencies from driver entry point to block layer completion 
> callback of <20 ms at high queue depths.  If 'thread' is not used, the max 
> latencies reported by fio align almost exactly with that seen by the driver.
> There are typically only one or two of these latency outliers during a 40 sec
> run, for example, but they represent a significant enough excursion to pull
> our std. dev. very high.
> 
> Has anyone witnessed this sort of behavior?  We see it with all the versions
> of fio that we have used (2.0.5+) with a variety of kernels.  It's also very
> suspicious that the max latency is either almost exactly 1 sec or aligns with
> our hardware incurred latency for the given queue depth.

I've seen that happen before as well, but I never got to the bottom of
it. I just tried, and I can trigger it fairly easily that dell box. If I
beat on two devices, it doesn't happen easily. Add the third, and it
hits almost immediately after starting up the threads.

For fio, the only difference between a thread and process is how they
are kicked off. So it would seem unlikely to be something in fio.
Perhaps it's a scheduling bug? But then it seems odd that nobody else
has seen this. I see exactly the same latencies you report, very close
to precisely 1s latencies. That is indeed very odd.

I'll try and poke at this a bit.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html