Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 8 Jan 2018 16:13:04 -0800

On Tue, 9 Jan 2018 07:38:11 +0800 Jiang Biao <jiang.biao2@xxxxxxxxxx> wrote:

> When running ltp stress test for 7*24 hours, vmscan occasionally emits the
> following warning continuously:
> 
> mb_cache_scan+0x0/0x3f0 negative objects to delete
> nr=-9232265467809300450
> ....
> 
> Trace info shows the freeable(mb_cache_count returns) is -1, which causes
> the continuous accumulation and overflow of total_scan.
> 
> This patch makes sure that mb_cache_count() not return a negative value,
> which makes the mbcache shrinker more robust.
> 
> ...
>
> --- a/fs/mbcache.c
> +++ b/fs/mbcache.c
> @@ -238,7 +238,11 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
>  			spin_lock(&cache->c_list_lock);
>  			if (!list_empty(&entry->e_list)) {
>  				list_del_init(&entry->e_list);
> -				cache->c_entry_count--;
> +				if (cache->c_entry_count > 0)
> +					cache->c_entry_count--;
> +				else
> +					WARN_ONCE(1, "mbcache: Entry count "
> +                          "going negative!\n");
>  				atomic_dec(&entry->e_refcnt);
>  			}
>  			spin_unlock(&cache->c_list_lock);

I agree with Jan's comment.  We need to figure out how ->c_entry_count
went negative.  mb_cache_count() says this state is "Unlikely, but not
impossible", but from a quick read I can't see how this happens - it
appears that coherency between ->c_list and ->c_entry_count is always
maintained under ->c_list_lock?