On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote: > On Mon, Jan 17, 2022 at 09:35:58AM -0500, Brian Foster wrote: > > > To Al's question, at the end of the day there is no rcu delay involved > > with inode reuse in XFS. We do use call_rcu() for eventual freeing of > > inodes (see __xfs_inode_free()), but inode reuse occurs for inodes that > > have been put into a "reclaim" state before getting to the point of > > freeing the struct inode memory. This lead to the long discussion [1] > > Ian references around ways to potentially deal with that. I think the > > TLDR of that thread is there are various potential options for > > improvement, such as to rcu wait on inode creation/reuse (either > > explicitly or via more open coded grace period cookie tracking), to rcu > > wait somewhere in the destroy sequence before inodes become reuse > > candidates, etc., but none of them seemingly agreeable for varying > > reasons (IIRC mostly stemming from either performance or compexity) [2]. > > > > The change that has been made so far in XFS is to turn rcuwalk for > > symlinks off once again, which looks like landed in Linus' tree as > > commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata > > buffers to the vfs"). The hope is that between that patch and this > > prospective vfs tweak, we can have a couple incremental fixes that at > > least address the practical problem users have been running into (which > > is a crash due to a NULL ->get_link() callback pointer due to inode > > reuse). The inode reuse vs. rcu thing might still be a broader problem, > > but AFAIA that mechanism has been in place in XFS on Linux pretty much > > forever. > > My problem with that is that pathname resolution very much relies upon > the assumption that any inode it observes will *not* change its nature > until the final rcu_read_unlock(). Papering over ->i_op->get_link reads > in symlink case might be sufficient at the moment (I'm still not certain > about that, though), but that's rather brittle. E.g. if some XFS change > down the road adds ->permission() on some inodes, you'll get the same > problem in do_inode_permission(). We also have places where we rely upon > sample ->d_seq > fetch ->d_flags > fetch ->d_inode > validate ->d_seq > ... > assume that inode type matches the information in flags > > How painful would it be to make xfs_destroy_inode() a ->free_inode() instance? > IOW, how far is xfs_inode_mark_reclaimable() from being callable in RCU > callback context? AIUI, not very close at all, I'm pretty sure we can't put it under RCU callback context at all because xfs_fs_destroy_inode() can take sleeping locks, perform transactions, do IO, run rcu_read_lock() critical sections, etc. This means that needs to run an a full task context and so can't run from RCU callback context at all. > IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both > are present, ->destroy_inode() will be called synchronously, followed > by ->free_inode() from RCU callback, so you can have both - moving just > the "finally mark for reuse" part into ->free_inode() would be OK. > Any blocking stuff (if any) can be left in ->destroy_inode()... Yup, that's pretty much what we already do, except that we run the RCU-delayed part of freeing the inode once XFS has finished with the inode internally and the background inode GC reclaims it. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx