On 19 Mar 2013 04:24:00 +0100 NeilBrown <neilb@xxxxxxx> wrote: > On 14 Mar 2013 18:31:38 +0100 Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx> > wrote: > > > On 13 Mar 2013 07:55:00 +0100 NeilBrown <neilb@xxxxxxx> wrote: > > > > > On 11 Mar 2013 17:13:41 +0100 Bodo Stroesser > > > <bstroesser@xxxxxxxxxxxxxx> > > > wrote: > > > > > > > Hi, > > > > > > > > AFAICS, there is one more race in RPC cache. > > > > > > > > The problem showed up for the "auth.unix.ip" cache, that has a > > > > reader. > > > > > > > > If a server thread tries to find a cache entry, it first does a > > > > lookup (or calls ip_map_cached_get() in this specific case). Then, > > > > it calls cache_check() for this entry. > > > > > > > > If the reader updates the cache entry just between the thread's > > > > lookup and cache_check() execution, cache_check() will do an > > > > upcall for this cache entry. This is, because > > > > sunrpc_cache_update() calls cache_fresh_locked(old, 0), which sets > > > > expiry_time to 0. > > > > > > > > Unfortunately, the reply to the upcall will not dequeue the queued > > > > cache_request, as the reply will be assigned to the cache entry > > > > newly created by the above cache update. > > > > > > > > The result is a growing queue of orphaned cache_request structures > > > > --> memory leak. > > > > > > > > I found this on a SLES11 SP1 with a backport of the latest patches > > > > that fix the other RPC races. On this old kernel, the problem also > > > > leads to svc_drop() being called for the affected RPC request > > > > (after svc_defer()). > > > > > > > > Best Regards > > > > Bodo > > > > > > I don't think this is a real problem. > > > The periodic call to "cache_clean" should find these orphaned > > > requests and purge them. So you could get a short term memory leak, > > > but it should correct itself. > > > Do you agree? > > > > > > Thanks, > > > NeilBrown > > > > Meanwhile I found the missing part of the race! > > > > It's just what I wrote above but additionally, immediately after the > > reader updated the cache, cache_clean() unhashes the old cache entry. > > In that case: > > 1) the thread will queue a request for a cache entry, that isn't > > in the hash lists. > > 2) cache_clean() never will access this old cache entry again > > 3) every further cache update will call cache_dequeue() for a newer > > cache entry, the request is never dequeued > > > > --> memory leak. > > Yes, I see that now. Thanks. > > I suspect that bug was introduced by commit 3af4974eb2c7867d6e16. > Prior to then, cache_clean would never remove anything with an active reference. If something was about to get CACHE_PENDING, it would have a reference so cache_clean would leave it alone. > > I wanted to fix this by getting the last 'put' to call cache_dequeue() if CACHE_PENDING was still set. However I couldn't get access to the CACHE_PENDING flag and the cache_detail at the same place - so I gave up on that. > > The patch I have included below sets a flag when an cache item is removed from the cache (by cache_clean) and refuses to lodge an upcall if the flag is set. That will ensure there is never a pending upcall when the item is finally freed. > > You patch only addresses the particular situation that you had found. I think it is possible that there might be other situations that also lead to memory leaks, so I wanted a solution that would guarantee that there couldn't be a leak. > I think, your patch is safe (And I've tried to think of a remaining race, but couldn't find ;-) I've inserted it into SLES11 SP1 instead of my patch and started the test. > > > > I verified this inserting some debug instructions. In a testrun that > > triggered 6 times, and the queue printed by crash after the run had 6 > > orphaned entries. > > > > As I wrote yesterday, I have a patch that solves the problem by > > retesting the state of the cache entry after setting CACHE_PENDING in > > cache_check(). I can send it if you like. > > > > Bodo > > > > P.S.: > > Maybe I'm wrong, but AFAICS, there are two further minor problems > > regarding (nearly) expired cache entries: > > a) ip_map_cached_get() in some situations can return an already > > outdated (it uses cache_valid that not fully checks the state) > > cache entry. If cache_check() is caclled for that entry, it will > > fail, I think > > b) Generally, if a cache entry is returned by sunrpc_cache_lookup() > > and the entry is in the last second of it's expiry_time, the > > clock might move to the next second before cache_check is called. > > If this happens, cache_check will fail, I think. > > Do you agree? > > Yes, but I'm not sure how important this is. > Normally cache entries should be refreshed well before they expire, so we should normally find an entry with more than half of its lifetime left. > > In the rare case where the scenario you describe happens, cache_check() will return -ETIMEDOUT. > In mainline, that will cause the request to be dropped and the connection closed so the client will try again and won't hit any race and so should get a correct result. > In SLES11SP1, we retry the lookup and will hopefully get a better result without having to close the connection. > > So this should be rare and should fail-safe. > > Does that agree with your understanding? Only partly, because I was wrong. On SLES11 SP1, as you say, cache_is_valid() examines expiry_time, but the retry will avoid to make a RPC request fail. On a mainline kernel, cache_is_valid() does not evaluate the expiry_time. So, an old entry might start an upcall, but as the result of cache_is_valid still is 0, nothing will fail on mainline. The only little issue remaining is the fact, that ip_map_cached_get() will accept outdated cache entries as long as they are not yet cleaned. In mainline this entry will even be accepted by cache_check(). Do you think it would be a good idea to replace cache_valid() by cache_is_valid()? Thank you Bodo > > > Thanks, > NeilBrown > > > From e76e6583405a3b5ff9c8d2ae1184704efb209ef0 Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@xxxxxxx> > Date: Tue, 19 Mar 2013 13:58:58 +1100 > Subject: [PATCH] sunrpc/cache: ensure items removed from cache do not have pending upcalls. > > It is possible for a race to set CACHE_PENDING after cache_clean() has removed a cache entry from the cache. > If CACHE_PENDING is still set when the entry is finally 'put', the cache_dequeue() will never happen and we can leak memory. > > So set a new flag 'CACHE_CLEANED' when we remove something from the cache, and don't queue and upcall if it is set. > > If CACHE_PENDING is set before CACHE_CLEANED, the call that > cache_clean() makes to cache_fresh_unlocked() will free memory as needed. If CACHE_PENDING is set after CACHE_CLEANED, the test in sunrpc_cache_pipe_upcall will ensure that the memory is not allocated. > > Reported-by: <bstroesser@xxxxxxxxxxxxxx> > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h index 303399b..8419f7d 100644 > --- a/include/linux/sunrpc/cache.h > +++ b/include/linux/sunrpc/cache.h > @@ -57,6 +57,7 @@ struct cache_head { > #define CACHE_VALID 0 /* Entry contains valid data */ > #define CACHE_NEGATIVE 1 /* Negative entry - there is no match for the key */ > #define CACHE_PENDING 2 /* An upcall has been sent but no reply received yet*/ > +#define CACHE_CLEANED 3 /* Entry has been cleaned from cache */ > > #define CACHE_NEW_EXPIRY 120 /* keep new things pending confirmation for 120 seconds */ > > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index 1677cfe..8e7ccbb 100644 > --- a/net/sunrpc/cache.c > +++ b/net/sunrpc/cache.c > @@ -306,7 +306,7 @@ EXPORT_SYMBOL_GPL(cache_check); > * a current pointer into that list and into the table > * for that entry. > * > - * Each time clean_cache is called it finds the next non-empty entry > + * Each time cache_clean is called it finds the next non-empty entry > * in the current table and walks the list in that entry > * looking for entries that can be removed. > * > @@ -453,6 +453,7 @@ static int cache_clean(void) > current_index ++; > spin_unlock(&cache_list_lock); > if (ch) { > + set_bit(CACHE_CLEANED, &ch->flags); > cache_fresh_unlocked(ch, d); > cache_put(ch, d); > } > @@ -1176,6 +1177,9 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h) > warn_no_listener(detail); > return -EINVAL; > } > + if (test_bit(CACHE_CLEANED, &h->flags)) > + /* Too late to make an upcall */ > + return -EAGAIN; > > buf = kmalloc(PAGE_SIZE, GFP_KERNEL); > if (!buf) > ÿôèº{.nÇ+?·?®??+%?Ëÿ±éݶ¥?wÿº{.nÇ+?·¥?{±þwìþ)í?æèw*jg¬±¨¶????Ý¢jÿ¾«þG«?éÿ¢¸¢·¦j:+v?¨?wèjØm¶?ÿþø¯ù®w¥þ?àþf£¢·h??â?úÿ?Ù¥