Re: [PATCH] sunrpc.ko: RPC cache fix races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 19, 2013 at 06:08:40PM +0100, bstroesser@xxxxxxxxxxxxxx wrote:
> Second attempt using the correct FROM. Sorry for the noise.
> 
> 
> Hi,
> 
> I found a problem in sunrpc.ko on a SLES11 SP1 (2.6.32.59-0,7.1)
> and fixed it (see first patch ifor 2.6.32.60 below).
> For us the patch works fine (tested on 2.6.32.59-0.7.1).
> 
> AFAICS from the code, the problem seems to persist in current
> kernel versions also. Thus, I added the second patch for 3.7.9.
> As the setup to reproduce the problem is quite complex, I couldn't
> test the second patch yet. So consider this one as a RFC.
> 
> Best regards,
> Bodo
> 
> Please CC me, I'm not on the list.
> 
> =========================================
> From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
> 
> We found the problem and tested the patch on a SLES11 SP1 2.6.32.59-0.7.1
> 
> This patch applies to linux-2.6.32.60 (changed monotonic_seconds -->
> get_seconds())

This patch may well be correct, but for stable I can't normally take
patches that aren't backports of specific upstream fixes.  That means:
first, we need to fix whatever remains to be fixed on the latest
upstream kernel.  Then, we need to identify the other upstream patches
are needed and aren't yet in 2.6.32.60.

...
> =========================================
> From: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
> 
> This patch applies to SLES 11 SP2 linux-3.0.51-0.7.9 and also to
> vanilla linux-3.7.9
> 
> It is untested and is only based on a code review after we
> analyzed the reason for NFS requests being dropped on a
> SLES11 SP1 (linux-2.6.32.59-0.7.1)
> 
> Sporadically NFS3 RPC requests to the nfs server are dropped due to
> cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry
> of the "auth.unix.gid" cache.
> In this case, no NFS reply is sent to the client.
> 
> The reason for the dropped requests are races in cache_check() when
> cache_make_upcall() returns -EINVAL (because it is called for a cache
> without readers) and cache_check() therefore refreshes the cache entry 
> (rv == -EAGAIN).
> 
> There are two details that need to be changed:
>  1) cache_revisit_request() must not be called before cache_fresh_locked()
>     has updated the cache entry, as cache_revisit_request() wakes up
>     threads waiting for the cache entry to be updated.
>     The explicit call to cache_revisit_request() is not needed, as
>     cache_fresh_unlocked() calls it anyway.
>     (But in case of not updating the cache entry, cache_revisit_request()
>     must be called).
>  2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
>     updated the cache entry, as cache_wait_req() called by
>     cache_defer_req() can return without really sleeping if it detects
>     CACHE_PENDING unset.

I think that makes sense.  Thanks for the explanation.  This is all a
bit subtle, so any additional review or testing would be appreciated....

--b.

> 
> Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
> ---
> 
> --- a/net/sunrpc/cache.c	2013-02-08 15:56:07.000000000 +0100
> +++ b/net/sunrpc/cache.c	2013-02-08 16:04:32.000000000 +0100
> @@ -230,11 +230,14 @@ static int try_to_negate_entry(struct ca
>  	rv = cache_is_valid(detail, h);
>  	if (rv != -EAGAIN) {
>  		write_unlock(&detail->hash_lock);
> +		clear_bit(CACHE_PENDING, &h->flags);
> +		cache_revisit_request(h);
>  		return rv;
>  	}
>  	set_bit(CACHE_NEGATIVE, &h->flags);
>  	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
>  	write_unlock(&detail->hash_lock);
> +	clear_bit(CACHE_PENDING, &h->flags);
>  	cache_fresh_unlocked(h, detail);
>  	return -ENOENT;
>  }
> @@ -275,8 +278,6 @@ int cache_check(struct cache_detail *det
>  		if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
>  			switch (cache_make_upcall(detail, h)) {
>  			case -EINVAL:
> -				clear_bit(CACHE_PENDING, &h->flags);
> -				cache_revisit_request(h);
>  				rv = try_to_negate_entry(detail, h);
>  				break;
>  			case -EAGAIN:
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux