On Thu, Jan 03, 2019 at 11:40:21PM +0000, Trond Myklebust wrote: > On Thu, 2019-01-03 at 17:45 -0500, J Bruce Fields wrote: > > On Thu, Jan 03, 2019 at 09:17:12AM -0500, Trond Myklebust wrote: > > > Use READ_ONCE() to tell the compiler to not optimse away the read > > > of > > > xprt->xpt_flags in svc_xprt_release_slot(). > > > > What exactly is the possible race here? And why is a READ_ONCE() > > sufficient, as opposed to some memory barriers? > > > > I may need to shut myself in a room with memory-barriers.txt, I'm > > pretty > > hazy on these things. > > > > It's not about fixing any races. It is about ensuring that the compiler > does not optimise away the read if the function is ever called from > inside a loop. Not an important fix, since I'm not aware of any cases > where this has happened. However strictly speaking, we should use > READ_ONCE() here because that variable is volatile; it can be changed > by a background action. I wonder if there's a race here independent of that change: svc_xprt_enqueue() callers all do something like: 1. change some condition 2. call svc_xprt_enqueue() to check whether the xprt should now be enqueued. where the conditions are settings of the xpt_flags, or socket wspace, or xpt_nr_rqsts. In theory if we miss some concurrent change we're OK because whoever's making that change will then also call svc_xprt_enqueue. But that's not enough; e.g.: task 1 task 2 ------ ------ set XPT_DATA atomic_dec(xpt_nr_rqsts) check XPT_DATA && check xpt_nr_rqsts check XPT_DATA && check xpt_nr_rqsts If the tasks only see their local changes, then neither see both conditions true, so the socket doesn't get enqueued. (And a request that was ready to be processed will sit around until someone else comes calls svc_xprt_enqueue() on that xprt.) The code's more complicated than that and maybe there's some reason that can't happen. --b.