On 2020/3/25 7:07 AM, NeilBrown wrote: >> >> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail) >> } >> } >> spin_unlock(&detail->hash_lock); >> + spin_unlock(&cache_list_lock); >> } >> EXPORT_SYMBOL_GPL(cache_purge); >> >> -- >> 2.20.1.2432.ga663e714 > I wonder if this is the best solution. > This code: > > hlist_for_each_entry_safe(ch, tmp, head, cache_list) { > sunrpc_begin_cache_remove_entry(ch, detail); > spin_unlock(&detail->hash_lock); > sunrpc_end_cache_remove_entry(ch, detail); > spin_lock(&detail->hash_lock); > } > > Looks wrong. > Dropping a lock while walking a list if only safe if you hold a > reference to the place-holder - 'tmp' in this case. but we don't. > As this is trying to remove everything in the list it would be safer to > do something like > > while (!hlist_empty(head)) { > ch = hlist_entry(head->first, struct cache_head, h); > sunrpc_begin_cache_remove_entry(ch, detail); > spin_unlock(&detail->hash_lock); > sunrpc_end_cache_remove_entry(ch, detail); > spin_lock(&detail->hash_lock); > } > > I'm guessing that would fix the problem in a more focused. > But I'm not 100% sure because there is no analysis given of the cause. > What line is > cache_purge+0xce/0x160 > ./scripts/faddr2line can tell you. > I suspect it is the hlist_for_each_entry_safe() line. > > Can you test the change I suggest and see if it helps? > > Thanks, > NeilBrown Sorry for the late. It took me some time to reproduce the bug stably so I can verify the correctness of the fix. You definitely pointed out the root cause. And the solution is more elegant. After applying your solution. The bug doesn't reproduce now. There's no race condition. hash_lock is designed to protect cache_detail in fine grain. And it already did its job. And yes, hlist_for_each_entry_safe is where the bug at. It may walk to a deleted entry(tmp). My v1 solution is a regression considering this. So I will modify the patch title in v2 too. BTW, I checked faddr2line output, it says cache_purge+0xce/0x160 is cache_put. It make sense too, and doesn't go against your theory. Here's my reproduce script: systemctl enable rpcbind systemctl enable nfs systemctl start rpcbind systemctl start nfs mkdir /tmp/x /tmp/y # Create some collision in hash table for i in `seq 256`; do mkdir /tmp/x/$i mkdir /tmp/y/$i exportfs localhost:/tmp/x/$i done for i in `seq 256`; do mount localhost:/tmp/x/$i /tmp/y/$i done END=$(cat /proc/self/net/rpc/nfsd.export/flush) NOW=$(date +%s) sleep $((END - NOW)) # Trigger cache_purge systemctl stop nfs & usleep 20000 # Trigger cache_clean echo > /proc/self/net/rpc/nfsd.export/flush To speedup the reproducing process, I also added mdelay(500) between acquiring and releasing hash_lock in cache_purge. Thank you so much! Yihao Wu