Re: [PATCH] poll: prevent missed events if _qproc is NULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> On Mon, 2012-12-31 at 13:21 +0000, Eric Wong wrote:
> > This patch seems to fix my issue with ppoll() being stuck on my
> > SMP machine: http://article.gmane.org/gmane.linux.file-systems/70414
> > 
> > The change to sock_poll_wait() in
> > commit 626cf236608505d376e4799adb4f7eb00a8594af
> >   (poll: add poll_requested_events() and poll_does_not_wait() functions)
> > seems to have allowed additional cases where the SMP memory barrier
> > is not issued before checking for readiness.
> > 
> > In my case, this affects the select()-family of functions
> > which register descriptors once and set _qproc to NULL before
> > checking events again (after poll_schedule_timeout() returns).
> > The set_mb() barrier in poll_schedule_timeout() appears to be
> > insufficient on my SMP x86-64 machine (as it's only an xchg()).
> > 
> > This may also be related to the epoll issue described by
> > Andreas Voellmy in http://thread.gmane.org/gmane.linux.kernel/1408782/
> 
> Hmm, the change seems not very logical to me.

My original description was not complete and I'm still bisecting
my problem (ppoll + send stuck).  However, my patch does solve the
issue Andreas encountered and I now understand why.

> If it helps, I would like to understand the real issue.
>
> commit 626cf236608505d376e4799adb4f7eb00a8594af should not have this
> side effect, at least for poll()/select() functions. The epoll() changes
> I am not yet very confident.

I have a better explanation of the epoll problem below.

An alternate version (limited to epoll) would be:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd96649..ca5f3d0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * Get current event bits. We can safely use the file* here because
 	 * its usage count has been increased by the caller of this function.
 	 */
+	smp_mb();
 	revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt);
 
 	/*

> I suspect a race already existed before this commit, it would be nice to
> track it properly.

I don't believe this race existed before that change.

Updated commit message below:

>From 87bca82bc39a941d9b8d5b8bc08b39a071a9884f Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@xxxxxxxx>
Date: Mon, 31 Dec 2012 13:20:23 +0000
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

ep_modify() works on files that are already registered with a wait queue
(and thus should not reregister).  For sockets, this means sk_sleep()
will return a non-NULL wait address.

ep_modify() must check for events that were received and ignored
_before_ ep_modify() was called.  So it must call f_op->poll() to
fish for events _after_ changing epi->event.events.

When f_op->poll() calls tcp_poll() (and thus sock_poll_wait()),
wait_address is non-NULL because the socket was already registered by
epoll.  Thus, ep_modify() passes a NULL pt to prevent re-registration.

When ep_modify() is called, sock_poll_wait() will see a wait_address,
but a NULL pt, and this caused the memory barrier to get skipped and
events to be missed (this memory barrier is described in the
documentation for wq_has_sleeper).

This regression appeared with the change to sock_poll_wait() in
commit 626cf236608505d376e4799adb4f7eb00a8594af
  (poll: add poll_requested_events() and poll_does_not_wait() functions)

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normalperson@xxxxxxxx>
Cc: Hans Verkuil <hans.verkuil@xxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Davide Libenzi <davidel@xxxxxxxxxxxxxxx>
Cc: Hans de Goede <hdegoede@xxxxxxxxxx>
Cc: Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxx>
Cc: David Miller <davem@xxxxxxxxxxxxx>
Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Tested-by: Andreas Voellmy <andreas.voellmy@xxxxxxxx>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@xxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
---
 include/net/sock.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c945fba..1923e48 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1925,8 +1925,9 @@ static inline bool wq_has_sleeper(struct socket_wq *wq)
 static inline void sock_poll_wait(struct file *filp,
 		wait_queue_head_t *wait_address, poll_table *p)
 {
-	if (!poll_does_not_wait(p) && wait_address) {
-		poll_wait(filp, wait_address, p);
+	if (wait_address) {
+		if (!poll_does_not_wait(p))
+			poll_wait(filp, wait_address, p);
 		/* We need to be sure we are in sync with the
 		 * socket flags modification.
 		 *
-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux