Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Tue, 4 Jan 2011 14:23:51 -0500

On Tue, Jan 04, 2011 at 10:22:31AM -0500, J. Bruce Fields wrote:
> On Tue, Jan 04, 2011 at 04:01:52PM +1100, NeilBrown wrote:
> > On Mon, 3 Jan 2011 15:55:14 -0500 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> > wrote:
> > > @@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
> > >  	}
> > >  }
> > >  
> > > +static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> > > +{
> > > +	int rv;
> > > +
> > > +	read_lock(&detail->hash_lock);
> > > +	rv = __cache_is_valid(detail, h);
> > > +	read_unlock(&detail->hash_lock);
> > > +	return rv;
> > > +}
> > 
> > I don't think there is anything in __cache_is_valid that needs to be
> > protected.
> > The compiler will almost certainly produce code which loads f->flags once and
> > then performs 1 or 2 bit tests against the value in the register and produces
> > one of 3 possible return values based on the result.
> > There is absolutely no value in putting locking around that, especially as
> > CACHE_VALID is never cleared.
> > 
> > Maybe you imagine a re-ordering of setting CACHE_NEGATIVE and CACHE_VALID,
> > but as they are in the same cache line (and in fact in the same byte) they
> > cannot be re-ordered.  We always set CACHE_NEGATIVE before CACHE_VALID and
> > there is no way those two could get to memory in the wrong order.
> 
> The risk would be a reordering of CACHE_VALID with setting of the actual
> contents in the !NEGATIVE case, so:
> 
> 	task doing lookup		task doing update
> 	^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^
> 
> 	1. test for CACHE_VALID		3. set item->contents
> 	   & !CACHE_NEGATIVE
> 
> 	2. dereference			4. set CACHE_VALID, clear
> 	   item->contents->...		   CACHE_NEGATIVE.
> 
> As I understand it, if we want to gaurantee that item->contents is good
> at step 2, then we need a write barrier between 3 and 4, together with a
> read barrier between 1 and 2.  Taking the spinlock is overkill, but
> should accomplish the same thing, as it forces 1 to occur before 3 or
> after 4, and adds any necessary memory barriers.

So, that being the real problem, perhaps it's clearer to use explicit
memory barriers instead of more locking, and add some comments.  Also
split this into a separate patch, as in the following.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html