On Tue, Aug 12, 2014 at 5:29 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > On Tue, Aug 12, 2014 at 04:54:53PM -0400, Trond Myklebust wrote: >> On Tue, Aug 12, 2014 at 4:23 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> > On Tue, Aug 12, 2014 at 04:09:52PM -0400, Trond Myklebust wrote: >> >> On Tue, Aug 12, 2014 at 3:53 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> >> > On Sun, Aug 03, 2014 at 01:03:10PM -0400, Trond Myklebust wrote: >> >> >> We're always _only_ waking up tasks from within the sp_threads list, so >> >> >> we know that they are enqueued and alive. The rq_wait waitqueue is just >> >> >> a distraction with extra atomic semantics. >> >> >> >> >> >> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> >> >> --- >> >> >> include/linux/sunrpc/svc.h | 1 - >> >> >> net/sunrpc/svc.c | 2 -- >> >> >> net/sunrpc/svc_xprt.c | 32 +++++++++++++++++--------------- >> >> >> 3 files changed, 17 insertions(+), 18 deletions(-) >> >> >> >> >> >> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h >> >> >> index 1bc7cd05b22e..3ec769b65c7f 100644 >> >> >> --- a/include/linux/sunrpc/svc.h >> >> >> +++ b/include/linux/sunrpc/svc.h >> >> >> @@ -280,7 +280,6 @@ struct svc_rqst { >> >> >> int rq_splice_ok; /* turned off in gss privacy >> >> >> * to prevent encrypting page >> >> >> * cache pages */ >> >> >> - wait_queue_head_t rq_wait; /* synchronization */ >> >> >> struct task_struct *rq_task; /* service thread */ >> >> >> }; >> >> >> >> >> >> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >> >> >> index 5de6801cd924..dfb78c4f3031 100644 >> >> >> --- a/net/sunrpc/svc.c >> >> >> +++ b/net/sunrpc/svc.c >> >> >> @@ -612,8 +612,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) >> >> >> if (!rqstp) >> >> >> goto out_enomem; >> >> >> >> >> >> - init_waitqueue_head(&rqstp->rq_wait); >> >> >> - >> >> >> serv->sv_nrthreads++; >> >> >> spin_lock_bh(&pool->sp_lock); >> >> >> pool->sp_nrthreads++; >> >> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >> >> >> index 2c30193c7a13..438e91c12851 100644 >> >> >> --- a/net/sunrpc/svc_xprt.c >> >> >> +++ b/net/sunrpc/svc_xprt.c >> >> >> @@ -348,8 +348,6 @@ static void svc_xprt_do_enqueue(struct svc_xprt *xprt) >> >> >> >> >> >> cpu = get_cpu(); >> >> >> pool = svc_pool_for_cpu(xprt->xpt_server, cpu); >> >> >> - put_cpu(); >> >> >> - >> >> >> spin_lock_bh(&pool->sp_lock); >> >> >> >> >> >> if (!list_empty(&pool->sp_threads) && >> >> >> @@ -382,10 +380,15 @@ static void svc_xprt_do_enqueue(struct svc_xprt *xprt) >> >> >> printk(KERN_ERR >> >> >> "svc_xprt_enqueue: server %p, rq_xprt=%p!\n", >> >> >> rqstp, rqstp->rq_xprt); >> >> >> - rqstp->rq_xprt = xprt; >> >> >> + /* Note the order of the following 3 lines: >> >> >> + * We want to assign xprt to rqstp->rq_xprt only _after_ >> >> >> + * we've woken up the process, so that we don't race with >> >> >> + * the lockless check in svc_get_next_xprt(). >> >> > >> >> > Sorry, I'm not following this: what exactly is the race? >> >> >> >> There are 2 potential races, both due to the lockless check for rqstp->rq_xprt. >> >> >> >> 1) You need to ensure that the reference count in xprt is bumped >> >> before assigning to rqstp->rq_xprt, because svc_get_next_xprt() does a >> >> lockless check for rqstp->rq_xprt != NULL, so there is no lock to >> >> ensure that the refcount bump occurs before whoever called >> >> svc_get_next_xprt() calls svc_xprt_put()... >> >> >> >> 2) You want to ensure that you don't call wake_up_process() after >> >> exiting svc_get_next_xprt() since you would no longer be guaranteed >> >> that the task still exists. By calling wake_up_process() before >> >> setting rqstp->rq_xprt, you ensure that if the task wakes up before >> >> calling wake_up_process(), then the test for rqstp->rq_xprt == NULL >> >> forces you into the spin_lock_bh() protected region, where it is safe >> >> to test rqstp->rq_xprt again and decide whether or not the task is >> >> still queued on &pool->sp_threads. >> > >> > Got it. So how about making that comment: >> > >> > /* >> > * Once we assign rq_xprt, a concurrent task in >> > * svc_get_next_xprt() could put the xprt, or could exit. >> > * Therefore, the get and the wake_up need to come first. >> > */ >> > >> > Is that close enough? >> > >> >> It's close: it's not so much any concurrent task, it's specifically >> the one that we're trying to wake up that may find itself waking up >> prematurely due to the schedule_timeout(), and then racing due to the >> lockless check. > > > Got it. I'll make it: > > /* > * The task rq_task could be concurrently executing the lockless > * check of rq_xprt in svc_get_next_xprt(). Once we set > * rq_xprt, that task could return from svc_get_next_xprt(), and > * then could put the last reference to the xprt, or could exit. > * Therefore, our get and wake_up need to come first: > */ > Looks good. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html