On Fri, 8 Dec 2023 11:14:32 +1100, david@xxxxxxxxxxxxx wrote: > On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzheng@xxxxxxxxx wrote: > > Hi, all > > > > I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk > > which can lead to null pointer references has been resolved? > > > > I browsed through emails about the following patches and their discussions: > > - https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfoster@xxxxxxxxxx/ > > - https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfoster@xxxxxxxxxx/ > > - https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.stgit@xxxxxxxxxxxxxxxxx/ > > > > And then came to the conclusion that this problem has not been solved, am I > > right? Did I miss some patch that could solve this problem? > > We fixed the known problems this caused by turning off the VFS > functionality that the rcu pathwalks kept tripping over. See commit > 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to > the vfs"). Sorry for the delay. The problem I encountered in the production environment was that during the rcu walk process the ->get_link() pointer was NULL, which caused a crash. As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to the vfs") first appeared in: - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/ Does this commit solve the problem of NULL ->get_link()? And how? > > Apart from that issue, I'm not aware of any other issues that the > XFS inode recycling directly exposes. > > > According to my understanding, the essence of this problem is that XFS reuses > > the inode evicted by VFS, but VFS rcu-walk assumes that this will not happen. > > It assumes that the inode will not change identity during the RCU > grace period after the inode has been evicted from cache. We can > safely reinstantiate an evicted inode without waiting for an RCU > grace period as long as it is the same inode with the same content > and same state. > > Problems *may* arise when we unlink the inode, then evict it, then a > new file is created and the old slab cache memory address is used > for the new inode. I describe the issue here: > > https://lore.kernel.org/linux-xfs/20220118232547.GD59729@xxxxxxxxxxxxxxxxxxx/ And judging from the relevant emails, the main reason why ->get_link() is set to NULL should be the lack of synchronize_rcu() before xfs_reinit_inode() when the inode is chosen to be reused. However, perhaps due to performance reasons, this solution has not been merged for a long time. How is it now? Maybe I am missing something in the threads of mail? Thank you very much. :) Jinliang Zheng > > That said, we have exactly zero evidence that this is actually a > problem in production systems. We did get systems tripping over the > symlink issue, but there's no evidence that the > unlink->close->open(O_CREAT) issues are manifesting in the wild and > hence there hasn't been any particular urgency to address it. > > > Are there any recommended workarounds until an elegant and efficient solution > > can be proposed? After all, causing a crash is extremely unacceptable in a > > production environment. > > What crashes are you seeing in your production environment? > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx