Re: [PATCH] nfsd: fix race between cache_clean and cache_purge

Yihao Wu <wuyihao@xxxxxxxxxxxxxxxxx> · Sat, 4 Apr 2020 21:53:20 +0800

On 2020/3/25 7:07 AM, NeilBrown wrote:
>>  
>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>  		}
>>  	}
>>  	spin_unlock(&detail->hash_lock);
>> +	spin_unlock(&cache_list_lock);
>>  }
>>  EXPORT_SYMBOL_GPL(cache_purge);
>>  
>> -- 
>> 2.20.1.2432.ga663e714
> I wonder if this is the best solution.
> This code:
> 
> 		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
> 			sunrpc_begin_cache_remove_entry(ch, detail);
> 			spin_unlock(&detail->hash_lock);
> 			sunrpc_end_cache_remove_entry(ch, detail);
> 			spin_lock(&detail->hash_lock);
> 		}
> 
> Looks wrong.
> Dropping a lock while walking a list if only safe if you hold a
> reference to the place-holder - 'tmp' in this case.  but we don't.
> As this is trying to remove everything in the list it would be safer to
> do something like
> 
>  while (!hlist_empty(head)) {
>  	ch = hlist_entry(head->first, struct cache_head, h);
> 	sunrpc_begin_cache_remove_entry(ch, detail);
> 	spin_unlock(&detail->hash_lock);
> 	sunrpc_end_cache_remove_entry(ch, detail);
> 	spin_lock(&detail->hash_lock);
>  }
> 
> I'm guessing that would fix the problem in a more focused.
> But I'm not 100% sure because there is no analysis given of the cause.
> What line is
>   cache_purge+0xce/0x160
> ./scripts/faddr2line can tell you.
> I suspect it is the hlist_for_each_entry_safe() line.
> 
> Can you test the change I suggest and see if it helps?
> 
> Thanks,
> NeilBrown

Sorry for the late. It took me some time to reproduce the bug stably so I
can verify the correctness of the fix.

You definitely pointed out the root cause. And the solution is more elegant.
After applying your solution. The bug doesn't reproduce now.

There's no race condition. hash_lock is designed to protect cache_detail in
fine grain. And it already did its job. And yes, hlist_for_each_entry_safe
is where the bug at. It may walk to a deleted entry(tmp). My v1 solution is
a regression considering this.

So I will modify the patch title in v2 too.

BTW, I checked faddr2line output, it says cache_purge+0xce/0x160 is cache_put.
It make sense too, and doesn't go against your theory.

Here's my reproduce script:

	systemctl enable rpcbind
	systemctl enable nfs
	systemctl start rpcbind
	systemctl start nfs
	mkdir /tmp/x /tmp/y

	# Create some collision in hash table
	for i in `seq 256`; do
		mkdir /tmp/x/$i
		mkdir /tmp/y/$i
		exportfs localhost:/tmp/x/$i
	done
	for i in `seq 256`; do
		mount localhost:/tmp/x/$i /tmp/y/$i
	done

	END=$(cat /proc/self/net/rpc/nfsd.export/flush)
	NOW=$(date +%s)
	sleep $((END - NOW))

	# Trigger cache_purge
	systemctl stop nfs &
	usleep 20000
	# Trigger cache_clean
	echo > /proc/self/net/rpc/nfsd.export/flush

To speedup the reproducing process, I also added mdelay(500) between acquiring
and releasing hash_lock in cache_purge.

Thank you so much!
Yihao Wu