On Tue, Oct 30, 2018 at 10:39:01AM -0700, Darrick J. Wong wrote: > On Tue, Oct 30, 2018 at 10:20:39PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > When a sudden buffer cache growth demand occurs from multiple > > concurrent sources, they can all fail shaking the cache at the same > > time and expand the cache. This results in the cache doubling in > > size multiple times when only a single expansion was really > > necessary. The resultant massive cache can cause repair to OOM as > > the cache will grow to much larger than memory can actually hold. > > > > Prevent this sudden and unnecessary expansion by rate limiting cache > > growth to one size increase every tens seconds. This is sufficient > > to prevent racing expansion calls from doing the wrong thing, but > > not too long to prevent necesary cache growth when it is undersized. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > include/cache.h | 1 + > > libxfs/cache.c | 17 ++++++++++++++--- > > 2 files changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/include/cache.h b/include/cache.h > > index 552b92489e46..1b774619f7a0 100644 > > --- a/include/cache.h > > +++ b/include/cache.h > > @@ -114,6 +114,7 @@ struct cache { > > unsigned long long c_misses; /* cache misses */ > > unsigned long long c_hits; /* cache hits */ > > unsigned int c_max; /* max nodes ever used */ > > + time_t expand_time; /* time of last expansion */ > > }; > > > > struct cache *cache_init(int, unsigned int, struct cache_operations *); > > diff --git a/libxfs/cache.c b/libxfs/cache.c > > index 139c7c1b9e71..e10df395e60e 100644 > > --- a/libxfs/cache.c > > +++ b/libxfs/cache.c > > @@ -62,6 +62,7 @@ cache_init( > > cache->bulkrelse = cache_operations->bulkrelse ? > > cache_operations->bulkrelse : cache_generic_bulkrelse; > > pthread_mutex_init(&cache->c_mutex, NULL); > > + time(&cache->expand_time); > > > > for (i = 0; i < hashsize; i++) { > > list_head_init(&cache->c_hash[i].ch_list); > > @@ -77,15 +78,25 @@ cache_init( > > return cache; > > } > > > > +/* > > + * rate limit expansion so several concurrent shakes don't instantly all > > + * expand the cache multiple times and blow repair to OOM death. > > + */ > > static void > > cache_expand( > > - struct cache * cache) > > + struct cache *cache) > > { > > + time_t now; > > + > > pthread_mutex_lock(&cache->c_mutex); > > + time(&now); > > + if (now - cache->expand_time > 10) { > > At first I wondered to myself about what happens if we fill the cache > fast enough that we run out of space in less than ten seconds, but > (afaict) the cache_expand caller will keep trying shake/expand until it > gets something, right? Yes. Expansion occurs when the shaker fails to make progress because the cache is full and nothing can be reclaimed after many attempts. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx