Re: [PATCH] sunrpc.ko: RPC cache fix races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20 Feb 2013 04:09:00 +0100 neilb@xxxxxxx wrote:

> On 19 Feb 2013 18:08:40 +0100 bstroesser@xxxxxxxxxxxxxx wrote:
> 
> > Second attempt using the correct FROM. Sorry for the noise.
> > 
> > 
> > Hi,
> > 
> > I found a problem in sunrpc.ko on a SLES11 SP1 (2.6.32.59-0,7.1) and 
> > fixed it (see first patch ifor 2.6.32.60 below).
> > For us the patch works fine (tested on 2.6.32.59-0.7.1).
> > 
> > AFAICS from the code, the problem seems to persist in current kernel 
> > versions also. Thus, I added the second patch for 3.7.9.
> > As the setup to reproduce the problem is quite complex, I couldn't 
> > test the second patch yet. So consider this one as a RFC.
> > 
> > Best regards,
> > Bodo
> > 
> > Please CC me, I'm not on the list.
> > 
> > =========================================
> > From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> > Date: Fri, 08 Feb 2013
> > Subject: [PATCH] net: sunrpc: fix races in RPC cache
> > 
> > We found the problem and tested the patch on a SLES11 SP1 
> > 2.6.32.59-0.7.1
> > 
> > This patch applies to linux-2.6.32.60 (changed monotonic_seconds -->
> > get_seconds())
> > 
> > Sporadically NFS3 RPC requests to the nfs server are dropped due to
> > cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry 
> > of the "auth.unix.gid" cache.
> > In this case, no NFS reply is sent to the client.
> > 
> > The reason for the dropped requests are races in cache_check() when
> > cache_make_upcall() returns -EINVAL (because it is called for a cache 
> > without readers) and cache_check() therefore refreshes the cache entry 
> > (rv == -EAGAIN).
> > 
> > There are three details that need to be changed:
> >  1) cache_revisit_request() must not be called before cache_fresh_locked()
> >     has updated the cache entry, as cache_revisit_request() wakes up
> >     threads waiting for the cache entry to be updated.
> 
> This certainly seems correct.  It is wrong to call cache_revisit_request() so early.
> 
> >     The explicit call to cache_revisit_request() is not needed, as
> >     cache_fresh_unlocked() calls it anyway.
> 
> But cache_fresh_unlocked is only called if "rv == -EAGAIN", however we also need to call it in the case where "age > refresh_age/2" - it must always be called after clearing CACHE_PENDING.
> 
> Also, cache_fresh_unlocked() only calls cache_revisit_request() if CACHE_PENDING is set, but we have just cleared it!  Some definitely something wrong here.
> (Note that I'm looking at the SLES 2.6.32 code at the moment, mainline is a bit different).
> 
> 
> >     (But in case of not updating the cache entry, cache_revisit_request()
> >     must be called. Thus, we use a fall through in the "case").
> 
> Hmm... I don't like case fallthroughs unless they have nice big comments:
>     /* FALLTHROUGH */
> or similar. :-)
> 
> >  2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> >     updated the cache entry, as cache_defer_req() can return without really
> >     sleeping if it detects CACHE_PENDING unset.
> 
> Agreed.  So we should leave the clearing of CACHE_PENDING to cache_fresh_unlocked().
> 
> 
> >     (In case of not updating the cache entry again we use the fall 
> > through)
> >  3) Imagine a thread that calls cache_check() and gets rv = -ENOENT from
> >     cache_is_valid(). Then it sets CACHE_PENDINGs and calls
> >     cache_make_upcall().
> >     We assume that meanwhile get_seconds() advances to the next
> >     sec. and a second thread also calls cache_check(). It gets -EAGAIN from
> >     cache_is_valid() for the same cache entry. As CACHE_PENDING still is
> >     set, it calls cache_defer_req() immediately and waits for a wakeup from
> >     the first thread.
> >     After cache_make_upcall() returned -EINVAL, the first thread does not
> >     update the cache entry as it had got rv = -ENOENT, but wakes up the
> >     second thread by calling cache_revisit_request().
> >     Thread two wakes up, calls cache_is_valid() and again gets -EAGAIN.
> >     Thus, the result of the second cache_check() is -ETIMEDOUT and the
> >     NFS request is dropped.
> 
> Yep, that's not so good....
> 
> 
> >     To solve this, the cache entry now is updated not only if rv == -EAGAIN
> >     but if rv == -ENOENT also. This is a sufficient workaround, as the
> >     first thread would have to stay in cache_check() between its call to
> >     cache_is_valid() and clearing CACHE_PENDING for more than 60 seconds
> >     to break the workaround.
> 
> Still, it isn't nice to just have a work-around.  It would be best to have a fix.
> The key problem here is that cache_is_valid() is time-sensitive.  This have been address in mainline - cache_is_valid doesn't depend on the current time there.

So, the solution would be a backport of the current mainline code ...

Anyway, I think for SLES11 SP1 and 2.6.32.60 the work-around would be sufficient.
BTW: it has the positive side effect, that - while a cache entry is in its second
half of life - no longer each cache_check() tries to do a cache_make_upcall().


> 
> 
> >     
> > Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> > ---
> > 
> > --- a/net/sunrpc/cache.c	2012-08-08 21:35:09.000000000 +0200
> > +++ b/net/sunrpc/cache.c	2013-02-08 14:29:41.000000000 +0100
> > @@ -233,15 +233,14 @@ int cache_check(struct cache_detail *det
> >  		if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> >  			switch (cache_make_upcall(detail, h)) {
> >  			case -EINVAL:
> > -				clear_bit(CACHE_PENDING, &h->flags);
> > -				cache_revisit_request(h);
> > -				if (rv == -EAGAIN) {
> > +				if (rv == -EAGAIN || rv == -ENOENT) {
> >  					set_bit(CACHE_NEGATIVE, &h->flags);
> >  					cache_fresh_locked(h, get_seconds()+CACHE_NEW_EXPIRY);
> > +					clear_bit(CACHE_PENDING, &h->flags);
> >  					cache_fresh_unlocked(h, detail);
> >  					rv = -ENOENT;
> > +					break;
> >  				}
> > -				break;
> >  
> >  			case -EAGAIN:
> >  				clear_bit(CACHE_PENDING, &h->flags);
> 
> I agree with some of this....
> Maybe:
> 
>   switch(cache_make_upcall(detail, h)) {
>   case -EINVAL:
>         if (rv) {
> 		set_bit(CACHE_NEGATIVE, &h->flags);
> 		cache_fresh_locked(h, get_seconds() + CACHE_NEW_EXPIRY);
> 		rv = -ENOENT;
> 	}
> 	/* FALLTHROUGH */
>   case -EAGAIN:
> 	cache_fresh_unlocked(h, detail);
>   }

I agree, your patch is obviously better than the mine.
But let me suggest one little change: I would like to substitute
cache_fresh_unlocked() by clear_bit() and cache_revisit_request(),
as the call to cache_dequeue() in cache_fresh_unlocked() seems to
be obsolete here:

   switch(cache_make_upcall(detail, h)) {
   case -EINVAL:
	if (rv) {
 		set_bit(CACHE_NEGATIVE, &h->flags);
 		cache_fresh_locked(h, get_seconds() + CACHE_NEW_EXPIRY);
		rv = -ENOENT;
 	}
	/* FALLTHROUGH */
   case -EAGAIN:
	clear_bit(CACHE_PENDING, &h->flags);
	cache_revisit_request(h, detail);
   }

> 
> Though it isn't good that cache_fresh_locked() is being called without holding a lock!  Maybe we should import try_to_negate_entry() from mainline.

Or simply add write_lock() and write_unlock() around the set_bit() and cache_fresh_locked()


> 
> > =========================================
> > From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> > Date: Fri, 08 Feb 2013
> > Subject: [PATCH] net: sunrpc: fix races in RPC cache
> > 
> > This patch applies to SLES 11 SP2 linux-3.0.51-0.7.9 and also to 
> > vanilla linux-3.7.9
> > 
> > It is untested and is only based on a code review after we analyzed 
> > the reason for NFS requests being dropped on a
> > SLES11 SP1 (linux-2.6.32.59-0.7.1)
> > 
> > Sporadically NFS3 RPC requests to the nfs server are dropped due to
> > cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry 
> > of the "auth.unix.gid" cache.
> > In this case, no NFS reply is sent to the client.
> > 
> > The reason for the dropped requests are races in cache_check() when
> > cache_make_upcall() returns -EINVAL (because it is called for a cache 
> > without readers) and cache_check() therefore refreshes the cache entry 
> > (rv == -EAGAIN).
> > 
> > There are two details that need to be changed:
> >  1) cache_revisit_request() must not be called before cache_fresh_locked()
> >     has updated the cache entry, as cache_revisit_request() wakes up
> >     threads waiting for the cache entry to be updated.
> >     The explicit call to cache_revisit_request() is not needed, as
> >     cache_fresh_unlocked() calls it anyway.
> >     (But in case of not updating the cache entry, cache_revisit_request()
> >     must be called).
> >  2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> >     updated the cache entry, as cache_wait_req() called by
> >     cache_defer_req() can return without really sleeping if it detects
> >     CACHE_PENDING unset.
> > 
> > Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> > ---
> > 
> > --- a/net/sunrpc/cache.c	2013-02-08 15:56:07.000000000 +0100
> > +++ b/net/sunrpc/cache.c	2013-02-08 16:04:32.000000000 +0100
> > @@ -230,11 +230,14 @@ static int try_to_negate_entry(struct ca
> >  	rv = cache_is_valid(detail, h);
> >  	if (rv != -EAGAIN) {
> >  		write_unlock(&detail->hash_lock);
> > +		clear_bit(CACHE_PENDING, &h->flags);
> > +		cache_revisit_request(h);
> 
> This should just be cache_fresh_unlocked(), as below.
> >  		return rv;
> >  	}
> >  	set_bit(CACHE_NEGATIVE, &h->flags);
> >  	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> >  	write_unlock(&detail->hash_lock);
> > +	clear_bit(CACHE_PENDING, &h->flags);
> Clearing this bit is wrong  - cache_frsh_unlocked will do that.

Yes. My fault. But instead of stripping the clear_bit(), I'd better like
to change the following cache_fresh_unlocked() to cache_revisit_request()

> 
> >  	cache_fresh_unlocked(h, detail);
> >  	return -ENOENT;
> >  }
> So maybe:
> 
> static int try_to_negate_entry(....)
> {
> 	int rv;
> 
> 	write_lock(&detail->hash_lock);
> 	rv = cache_is_valid(detail, h);
> 	if (rv == -EAGAIN) {
> 		set_bit(CACHE_NEGATIVE, &h->flags);
> 		cache_fresh_locked(h, ....);
> 		rv = -ENOENT;
> 	}
> 	write_unlock(&detail->hash_lock);
> 	cache_fresh_unlocked(h, detail);

For the same reason as written above, I'd better like
clear_bit() and cache_revisit_request()

> 	return rv;
> }
> ????
> 
> 
> > @@ -275,8 +278,6 @@ int cache_check(struct cache_detail *det
> >  		if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> >  			switch (cache_make_upcall(detail, h)) {
> >  			case -EINVAL:
> > -				clear_bit(CACHE_PENDING, &h->flags);
> > -				cache_revisit_request(h);
> >  				rv = try_to_negate_entry(detail, h);
> >  				break;
> >  			case -EAGAIN:
> 
> Yes, those lines should definitely be removed.
> 
> So maybe this against mainline:
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index 9afa439..7296644 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -228,12 +228,11 @@ static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h
>  
>  	write_lock(&detail->hash_lock);
>  	rv = cache_is_valid(detail, h);
> -	if (rv != -EAGAIN) {
> -		write_unlock(&detail->hash_lock);
> -		return rv;
> +	if (rv == -EAGAIN) {
> +		set_bit(CACHE_NEGATIVE, &h->flags);
> +		cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> +		rv = -ENOENT;
>  	}
> -	set_bit(CACHE_NEGATIVE, &h->flags);
> -	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
>  	write_unlock(&detail->hash_lock);
>  	cache_fresh_unlocked(h, detail);
>  	return -ENOENT;
> @@ -275,13 +274,10 @@ int cache_check(struct cache_detail *detail,
>  		if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
>  			switch (cache_make_upcall(detail, h)) {
>  			case -EINVAL:
> -				clear_bit(CACHE_PENDING, &h->flags);
> -				cache_revisit_request(h);
>  				rv = try_to_negate_entry(detail, h);
>  				break;
>  			case -EAGAIN:
> -				clear_bit(CACHE_PENDING, &h->flags);
> -				cache_revisit_request(h);
> +				cache_fresh_unlocked(h, detail);

And again: I don't like it ;-)

>  				break;
>  			}
>  		}
> 
> 
> Is that convincing?

Of course.

Do you think it would be a good idea to use the same code structure as suggested for 2.6.32.60 ?
If so, this would result in the following patch:

--- a/net/sunrpc/cache.c	2013-02-20 13:56:30.000000000 +0100
+++ b/net/sunrpc/cache.c	2013-02-20 14:05:08.000000000 +0100
@@ -222,23 +222,6 @@
 	}
 }

-static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h)
-{
-	int rv;
-
-	write_lock(&detail->hash_lock);
-	rv = cache_is_valid(detail, h);
-	if (rv != -EAGAIN) {
-		write_unlock(&detail->hash_lock);
-		return rv;
-	}
-	set_bit(CACHE_NEGATIVE, &h->flags);
-	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
-	write_unlock(&detail->hash_lock);
-	cache_fresh_unlocked(h, detail);
-	return -ENOENT;
-}
-
 /*
  * This is the generic cache management routine for all
  * the authentication caches.
@@ -275,10 +258,15 @@
 		if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
 			switch (cache_make_upcall(detail, h)) {
 			case -EINVAL:
-				clear_bit(CACHE_PENDING, &h->flags);
-				cache_revisit_request(h);
-				rv = try_to_negate_entry(detail, h);
-				break;
+				write_lock(&detail->hash_lock);
+				rv = cache_is_valid(detail, h);
+				if (rv == -EAGAIN) {
+					set_bit(CACHE_NEGATIVE, &h->flags);
+					cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+					rv = -ENOENT;
+				}
+				write_unlock(&detail->hash_lock);
+				/* FALLTHROUGH */
 			case -EAGAIN:
 				clear_bit(CACHE_PENDING, &h->flags);
 				cache_revisit_request(h);



> 
> Thanks a lot for your very thorough analysis!
> 
> NeilBrown
> 

Thank you for the quick response!

Best regards,
Bodo
ÿôèº{.nÇ+?·?®?­?+%?Ëÿ±éݶ¥?wÿº{.nÇ+?·¥?{±þwìþ)í?æèw*jg¬±¨¶????Ý¢jÿ¾«þG«?éÿ¢¸¢·¦j:+v?¨?wèjØm¶?ÿþø¯ù®w¥þ?àþf£¢·h??â?úÿ?Ù¥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux