Re: Latency spikes with 'thread' option

Jens Axboe <axboe@xxxxxxxxx> · Wed, 19 Dec 2012 08:00:38 +0100

On 2012-12-18 22:16, Sam Bradshaw (sbradshaw) wrote:
> 
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@xxxxxxxxx]
>> Sent: Tuesday, December 18, 2012 12:29 AM
>> To: Sam Bradshaw (sbradshaw)
>> Cc: fio@xxxxxxxxxxxxxxx; Mike Berhan (mberhan)
>> Subject: Re: Latency spikes with 'thread' option
>>
>> On 2012-12-18 08:21, Jens Axboe wrote:
>>> Good analysis and I believe you are correct. It's not safely shared and
>>> should be thread local. A quick test here with the below seems to
>>> indicate that that is indeed the issue, I don't see any time weirdness
>>> with that applied.
>>
>> Bah, so that didn't work on all the supported platforms (notably OSX
>> does not have support for __thread). I've committed a patch that should
>> work across platforms, please give the current fio git a try (5d879392
>> or newer).
> 
> Works great.  No abnormal latency spikes, nor do our measurements of peak
> IOPs differ from prior releases.

Good!

> One tidbit though: fio_gettime() shows a much larger % of execution time
> with the default clocksource than the gettimeofday() clocksource.  In a
> 512b random read workload with gtod_reduce disabled, for example, up to 15% 
> of the userspace execution time was spent in fio_gettime() with the default
> clocksource vs. 5% with gettimeofday().  The latency is mostly incurred
> at the divisions that convert nsecs? to secs for populating the timeval
> struct.  I don't have a suggestion to replace the
> 
> usecs = t / cycles_per_usec 
> 
> division but the 
> 
> tp->tv_sec = usecs / 1000000
> 
> could be reduced by instead comparing "t" to "tv->last_cycles" to see if
> they differ by more than 1000000.  If so, do the division and capture
> secs in tv->tv_sec.  If not, just copy the last recorded tv->tv_sec into
> tp->tv_sec.  That way, the division in only incurred every second at a 
> minimum.

That would work, though I'm surprised the two divisions are that costly
for fixed point math. It would be a branch more, but since it'd hit most
of the time, should be easy enough for the branch predictor.

There's also the option of disabling some of the time keeping. Are you
interested only in completion latency? Then you could disable submission
latency measuring. That would reduce the time calls per IOP by 25%.

> I could put together a patch if you like; I just have limited ability
> to test the change on the multitude of platforms supported by fio.

For this particular case, the risk of platform breakage is small. So go
for it!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html