On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzheng@xxxxxxxxx wrote: > Hi, all > > I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk > which can lead to null pointer references has been resolved? > > I browsed through emails about the following patches and their discussions: > - https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfoster@xxxxxxxxxx/ > - https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfoster@xxxxxxxxxx/ > - https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.stgit@xxxxxxxxxxxxxxxxx/ > > And then came to the conclusion that this problem has not been solved, am I > right? Did I miss some patch that could solve this problem? We fixed the known problems this caused by turning off the VFS functionality that the rcu pathwalks kept tripping over. See commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to the vfs"). Apart from that issue, I'm not aware of any other issues that the XFS inode recycling directly exposes. > According to my understanding, the essence of this problem is that XFS reuses > the inode evicted by VFS, but VFS rcu-walk assumes that this will not happen. It assumes that the inode will not change identity during the RCU grace period after the inode has been evicted from cache. We can safely reinstantiate an evicted inode without waiting for an RCU grace period as long as it is the same inode with the same content and same state. Problems *may* arise when we unlink the inode, then evict it, then a new file is created and the old slab cache memory address is used for the new inode. I describe the issue here: https://lore.kernel.org/linux-xfs/20220118232547.GD59729@xxxxxxxxxxxxxxxxxxx/ That said, we have exactly zero evidence that this is actually a problem in production systems. We did get systems tripping over the symlink issue, but there's no evidence that the unlink->close->open(O_CREAT) issues are manifesting in the wild and hence there hasn't been any particular urgency to address it. > Are there any recommended workarounds until an elegant and efficient solution > can be proposed? After all, causing a crash is extremely unacceptable in a > production environment. What crashes are you seeing in your production environment? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx