RE: Latency spikes with 'thread' option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Jens Axboe [mailto:axboe@xxxxxxxxx]
> Sent: Thursday, December 13, 2012 5:19 AM
> To: Sam Bradshaw (sbradshaw)
> Cc: fio@xxxxxxxxxxxxxxx
> Subject: Re: Latency spikes with 'thread' option
> 
> On 2012-12-12 21:11, Sam Bradshaw (sbradshaw) wrote:
> > Hi All,
> >
> > We're running queue depth sweeps with a 4k random read workload (sample config
> > below) against a high performance PCIe SSD - the Micron p320h.  We're seeing
> > latency spikes to 1 sec when the 'thread' option is used.  Instrumenting the
> > driver, we see max latencies from driver entry point to block layer completion
> > callback of <20 ms at high queue depths.  If 'thread' is not used, the max
> > latencies reported by fio align almost exactly with that seen by the driver.
> > There are typically only one or two of these latency outliers during a 40 sec
> > run, for example, but they represent a significant enough excursion to pull
> > our std. dev. very high.
> >
> > Has anyone witnessed this sort of behavior?  We see it with all the versions
> > of fio that we have used (2.0.5+) with a variety of kernels.  It's also very
> > suspicious that the max latency is either almost exactly 1 sec or aligns with
> > our hardware incurred latency for the given queue depth.
> 
> I've seen that happen before as well, but I never got to the bottom of
> it. I just tried, and I can trigger it fairly easily that dell box. If I
> beat on two devices, it doesn't happen easily. Add the third, and it
> hits almost immediately after starting up the threads.
> 
> For fio, the only difference between a thread and process is how they
> are kicked off. So it would seem unlikely to be something in fio.
> Perhaps it's a scheduling bug? But then it seems odd that nobody else
> has seen this. I see exactly the same latencies you report, very close
> to precisely 1s latencies. That is indeed very odd.

I've spent some time poking around and I have some more data points and a
suggestion.

First, the 1 second latency events come in batches and those batches occur 
suspiciously close to a usec wrap (0.999999 us -> 1.000000 us).  Second, if you
subtract exactly 1 second from these outlier latencies, the remaining amount is
very close to what our instrumented low level driver records for the IO latency
and consistent with the expected latencies of our SSD.  Similarly, the tv_usec
portion of the timeval structs shows increasing values.  See snippet below.

Format is like    start: <start_time.tv_sec>.<start_time.tv_usec>

latency: 1004657 us, lba: 1111289192, start: 1355776806.995294 issue: 1355776806.995312 complete: 1355776807.999969
latency: 1000494 us, lba: 203093568, start: 1355776806.999456 issue: 1355776806.999475 complete: 1355776807.999969
latency: 1000404 us, lba: 551350992, start: 1355776806.999546 issue: 1355776806.999565 complete: 1355776807.999969
latency: 1000477 us, lba: 449672928, start: 1355776806.999484 issue: 1355776806.999492 complete: 1355776807.999969

All this pointed to the time collection code being buggy.  Reviewing the 
code, I spotted this in fio_gettime():

	/*
	 * If Linux is using the tsc clock on non-synced processors,
	 * sometimes time can appear to drift backwards. Fix that up.
	 */
	if (last_tv_valid) {
		if (tp->tv_sec < last_tv.tv_sec)
			tp->tv_sec = last_tv.tv_sec;
		else if (last_tv.tv_sec == tp->tv_sec &&
			 tp->tv_usec < last_tv.tv_usec)
			tp->tv_usec = last_tv.tv_usec;
	}
	last_tv_valid = 1;
	memcpy(&last_tv, tp, sizeof(*tp));

This does not appear to be multi-thread safe.  Pre-emption can occur between 
either comparison and the subsequent update.  Commenting it out makes the
problem go away (at the expense of being subject to drift).  How about
making last_tv & last_tv_valid thread-local?

-Sam


--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux