At 6/19/2012 11:17 AM, Christof Meerwald wrote:
Hi Paton,
On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
> We believe that EPOLLONESHOT is required in order to make any
> sensible use of calling epoll_wait on a single epoll set
> concurrently in multiple threads.
I guess we have to disagree here - though it might be more difficult.
> >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> >> This patch introduces the new epoll_ctl command EPOLL_CTL_DISABLE, which
> >> disables the associated epoll item and returns -EBUSY if the
> >epoll item is not
> >> currently in the epoll ready queue. This allows multiple
threads to use a
> >> mutex to determine when it is safe to delete an epoll item and
> >its associated
> >> resources. This allows epoll items to be deleted and closed
efficiently and
> >> without error.
Maybe I am missing something here (as I am not really familiar with
the kernel internals), but I don't really understand the logic behind
your patch. Isn't the "expected" case that the item is not on the
ready list and no I/O is being processed for that item?
Consider the case where we want to have a set of threads waiting for
'write' events on a set of pipes or sockets. We have no control over
when code on the other side of a pipe or socket might write into it,
and so have no control over when one of the threads calling
epoll_wait will receive events relative to the timing of the thread
that is attempting to cancel the pending read operation.
I believe there is no "expected" case, because the probability for an
item to be on the ready list is a complex function of the number of
file descriptors being monitored, the frequency at which those
descriptors receive events, the number of threads calling epoll_wait,
and the complexity of the code responding to events.
Therefore for some clients of epoll it will be the case that items
will often be on the ready list, and for others it will not.
So I think instead of checking for the item being on the ready list,
checking for the event mask would make more sense for me, e.g.
if (!(epi->event.events & ~EP_PRIVATE_BITS))
I think that would be fine. However, the inlined function
ep_is_linked boils down to a call to the inline function list_empty,
which is just a single comparison. I haven't compared the
disassembly, but I would expect the two methods to be roughly
equivalent in terms of performance. Given that the operation of
deleting an epoll item is likely to be an exceptional circumstance
and therefore not performance-critical, would it be better to test
against ep_is_linked for clarity's sake in the code?
However, I believe these discussions are rendered moot by your
suggestion below:
But, taking one step back - wouldn't an alternative approach be to add
some mechanism to allow a thread to post a user-event for an fd? So in
delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
fd which you can then handle in your epoll_wait processing thread -
with no additional synchronisation necessary.
I think this is an excellent suggestion, and in fact your proposal is
more similar to what Windows provides when solving this problem. I'll
test this idea out with our code and get back to you. Is there an
existing kernel technique that you would recommend for posting a user
event for an fd, or should I explore using epoll_ctl with EPOLL_CTL_MOD?
I apologize in advance for any delay in responding to you; our office
is closed next week.
Thank you,
Pat
However, this would still require EPOLLONESHOT to be useful for memory
management.
Christof
--
http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html