Re: [PATCH] Adding userspace_libaio_reap option

Jens Axboe <axboe@xxxxxxxxx> · Tue, 30 Aug 2011 11:51:37 -0600

On 2011-08-30 11:47, Daniel Ehrenberg wrote:
> On Tuesday, August 30, 2011, Jens Axboe <axboe@xxxxxxxxx <mailto:axboe@xxxxxxxxx>> wrote:
>> On 2011-08-29 18:29, Dan Ehrenberg wrote:
>>> When a single thread is reading from a libaio io_context_t object
>>> in a non-blocking polling manner (that is, with the minimum number
>>> of events to return being 0), then it is possible to safely read
>>> events directly from user-space, taking advantage of the fact that
>>> the io_context_t object is a pointer to memory with a certain layout.
>>> This patch adds an option, userspace_libaio_reap, which allows
>>> reading events in this manner when the libaio engine is used.
>>>
>>> You can observe its effect by setting iodepth_batch_complete=0
>>> and seeing the change in distribution of system/user time based on
>>> whether this new flag is set. If userspace_libaio_reap=1, then
>>> busy polling takes place in userspace, and there is a larger amount of
>>> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
>>> larger amount of sys CPU from the polling in the kernel.
>>>
>>> Polling from a queue in this manner is several times faster. In my
>>> testing, it took less than an eighth as much time to execute a
>>> polling operation in user-space than with the io_getevents syscall.
>>
>> Good stuff! The libaio side looks good, but I think we should add engine
>> specific options under the specific engine. With all the
>> commands/options that fio has, it quickly becomes a bit unwieldy. So,
>> idea would be to have:
>>
>> ioengine=libaio:userspace_reap
> 
> Good idea. I was looking around for engine-specific options but didn't
> see any examples. I like this convention.

Optimally, we should be able to nest options under the options. But a
quicker hack should suffice, can always be extended if need be.

>>
>> I'll look into that.
>>
>> One question on the code:
>>
>>> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
>>> +                     struct io_event *events)
>>> +{
>>> +     long i = 0;
>>> +     unsigned head;
>>> +     struct aio_ring *ring = (struct aio_ring*)aio_ctx;
>>> +
>>> +     while (i < max) {
>>> +             head = ring->head;
>>> +
>>> +             if (head == ring->tail) {
>>> +                     /* There are no more completions */
>>> +                     break;
>>> +             } else {
>>> +                     /* There is another completion to reap */
>>> +                     events[i] = ring->events[head];
>>> +                     ring->head = (head + 1) % ring->nr;
>>> +                     i++;
>>> +             }
>>> +     }
>>
>> Don't we need a read barrier here before reading the head/tail?
>>
> Of course; how did I forget that?
> 
> I can make a fine barrier to run on my x64 machines, but it would be
> much better to not introduce an architectural dependency. Is there any
> kind of free library for this? Google has one (used in V8) but it's
> C++ and probably isn't on enough architectures. And of course the
> Linux kernel has one, but it would be a small project to extract it
> for use in user-space--or has someone done this work?

Fio already includes read and write barriers, they are called
read_barrier() and write_barrier().

FWIW, I agree with Jeff that this would be best handled in the libaio
library code. But if we can make it work reliably with the generic
kernel code (and I think we should), then I want to carry it in fio. For
patches that aren't even merged yet, the road to a setup that already
has this included by default is very long.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html