Re: [PATCH] Adding userspace_libaio_reap option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like I accidentally sent a reply just to Jeff rather than to the
list, and we've had a little exchange this way. For the record, here's
what we discussed. Jeff, please reply to this email instead of the one
I just sent you.

---------- Forwarded message ----------
From: Daniel Ehrenberg <dehrenberg@xxxxxxxxxx>
Date: Wed, Aug 31, 2011 at 3:55 PM
Subject: Re: [PATCH] Adding userspace_libaio_reap option
To: Jeff Moyer <jmoyer@xxxxxxxxxx>


On Wed, Aug 31, 2011 at 10:08 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Daniel Ehrenberg <dehrenberg@xxxxxxxxxx> writes:
>
>> On Tue, Aug 30, 2011 at 2:14 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>>> Daniel Ehrenberg <dehrenberg@xxxxxxxxxx> writes:
>>>
>>>> On Tuesday, August 30, 2011, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>>>>> Dan Ehrenberg <dehrenberg@xxxxxxxxxx> writes:
>>>>>
>>>>>> When a single thread is reading from a libaio io_context_t object
>>>>>> in a non-blocking polling manner (that is, with the minimum number
>>>>>> of events to return being 0), then it is possible to safely read
>>>>>> events directly from user-space, taking advantage of the fact that
>>>>>> the io_context_t object is a pointer to memory with a certain layout.
>>>>>> This patch adds an option, userspace_libaio_reap, which allows
>>>>>> reading events in this manner when the libaio engine is used.
>>>>>
>>>>> I haven't yet tried to poke holes in your code, but I'm pretty sure I
>>>>> can find some.  I have patches for the kernel and libaio which allow
>>>>> user-space reaping of events.  Why don't I dust those off and post them,
>>>>> and then fio won't have to change at all?  That seems like to proper
>>>>> approach to solving the problem.
>>>>>
>>>>> Cheers,
>>>>> Jeff
>>>>>
>>>>
>>>> Ken Chen posted some patches which accomplish this in 2007. However, I was a
>>>> little concerned about his lock-free queue structure--it seems like an
>>>> integer overflow might cause events to be lost; on the other hand this is
>>>> very unlikely. Are you talking about Ken's patchset or another one?
>>>
>>> I'm talking about another one, since I completely forgot about Ken's
>>> patches.  Thanks for reminding me!
>>>
>>> I'm not sure your approach will work on all architectures without kernel
>>> modification.  I think at least a flush_dcache_page may be required.
>>> More importantly, though, I'd like to discourage peeking into the
>>> internals outside of the libaio library.  Doing this makes it really
>>> hard to change things moving forward.
>>
>> Are you saying flush_dcache_page from the kernel or from user-space?
>> What kind of architecture will have problems? What kind of failures
>> could result?
>
> From the kernel.  Arm, ppc, mips, sparc64, etc would have issues.
> Basically everything that #defines ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1.
> Specifically, updates to the tail by the kernel would not be seen in
> userspace.

If updates to the tail are not seen in userspace immediately, then it
will just take longer for the reaping to occur, right? Eventually
something else will flush the cache, and then userspace can see that
there's a new event. A delay for observing an event is a performance,
not correctness, bug. Or am i understanding this wrong?
>
>> About ABI dependence: I agree, it would be better to have things in
>> libaio rather than here. But to do reaping like this, with certain
>> restrictions about what context it's used in, I think we'd have to
>> make another function rather than just changing io_getevents. At
>> first, I was looking into changing io_getevents, but then I realized
>> that the application I'm working on optimizing, like FIO, is only
>> calling io_getevents in a certain pattern, making all of the
>> synchronization unnecessary. I don't think these two things are the
>> only users of io_getevents in this pattern. But maybe proper
>> synchronization can be made cheap enough that there's not a big
>> penalty for doing it properly all the time.
>
> I'm not sure what penalty you think exists.  It's a matter of switching
> a spinlock in the kernel to an atomic cmpxchg.  Userspace would then
> also need atomic ops.  I'm pretty sure it'll be a net win over a system
> call.
>
> Cheers,
> Jeff
>

I was talking about the penalty of doing the atomic cmpxchg in
userspace rather than doing non-atomic operations. I agree that the
atomic operations in userspace will be a win over the system call,
since I've measured this. The atomic operations themselves actually
aren't that expensive, but they're not completely free; I don't have a
good idea of how much they are relative to the rest of what goes on in
a real workload, but probably not much.

Dan
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux