Re: epoll with ONESHOT possibly fails to deliver events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andreas Voellmy <andreas.voellmy@xxxxxxxx> wrote:
> Using strace, I checked that my program is using epoll api as I
> described. Here is a fragment of the strace output that demonstrates
> my use: 
> 
> recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90
> sendto(161, "HTTP/1.1 200 OK\r\nDate: Tue, 09 O"..., 323, 0, NULL, 0) = 323
> write(6, "\1\0\0\0\0\0\0\0", 8)         = 8
> recvfrom(161, 0x7f05ef6c3070, 90, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> epoll_ctl(7, EPOLL_CTL_MOD, 161, {EPOLLIN|EPOLLONESHOT|EPOLLET, {u32=161, u64=4294967457}}) = 0
> epoll_wait(7, {{EPOLLIN, {u32=161, u64=4294967457}}, {EPOLLIN, {u32=160, u64=16673999036704882848}}, {EPOLLIN, {u32=162, u64=22028646743015586}}}, 64, 0) = 3
> 
> I.e. we do the following (1) receive until EAGAIN, (2) register socket
> with epoll_ctl. In addition epoll_wait is called repeatedly, often
> following (2), as in the fragment above.
> 
> Is this considered a correct usage of the epoll API? If not, what is
> wrong with this usage?

It looks right to me.

> On Dec 11, 2012, at 5:23 PM, Andreas Voellmy <andreas.voellmy@xxxxxxxx> wrote:
> > I am using epoll for the Linux (version 3.4.0) implementation of the
> > event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS
> > (runtime system). I am running into a bug that has only popped up
> > using many cores (> 16) and under particular kind of load. I've been
> > debugging for a couple of days now, and I can't find the error in
> > the way that I am using epoll. I'm starting to wonder whether I am
> > either misunderstanding the semantics of epoll and TCP sockets
> > (likely) or there may be a bug in epoll itself (less likely). 

Everything you describe with your epoll usage seems valid and lines up
with my use of it.

> > Another thread, distinct from all of the threads serving particular
> > sockets, is perfoming epoll_wait calls. When sockets are returned as
> > being ready from an epoll_wait call, the thread signals to the
> > condition variable for the socket.

Perhaps there is a bug in the way your epoll_wait thread
uses the condition variable to notify other threads?

Fwiw, I just use epoll_wait(maxevents=1) in my normal threads (right
after I call epoll_ctl()).  This means I can avoid both the condition
variable and also avoid using a dedicated thread calling epoll_wait().

> > Since I am using EPOLLONESHOT, I assume that there is no need to
> > also perform epoll_ctl with EPOLL_CTL_DEL here. 

Correct.

> > The problem I am encountering is that sometimes a thread will block
> > waiting for the readiness signal and will never get notified, even
> > though there is data to be read. This behavior seems to go away when
> > I remove EPOLLONESHOT flag when registering the event. 

Is the thread the one waiting on the condition variable or epoll_wait?
In your situation (stream I/O via multiple threads, single epoll
descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux