On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote: > If I go back to the inactive -> reclaimable grace period variant and > also insert a start_poll_synchronize_rcu() and > poll_state_synchronize_rcu() pair across the inactive processing > sequence, I start seeing numbers closer to ~36k cycles. IOW, the > xfs_inodegc_inactivate() helper looks something like this: > > if (poll_state_synchronize_rcu(ip->i_destroy_gp)) > xfs_inodegc_set_reclaimable(ip); > else > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback); > > ... to skip the rcu grace period if one had already passed while the > inode sat on the inactivation queue and was processed. > > However my box went haywire shortly after with rcu stall reports or > something if I let that continue to run longer, so I'll probably have to > look into that lockdep splat (complaining about &pag->pag_ici_lock in > rcu context, perhaps needs to become irq safe?) or see if something else > is busted.. Umm... Could you (or Dave) describe where does the mainline do the RCU delay mentioned several times in these threads, in case of * unlink() * overwriting rename() * open() + unlink() + close() (that one, obviously, for regular files) The thing is, if we already do have an RCU delay in there, there might be a better solution - making sure it happens downstream of d_drop() (in case when dentry has other references) or dentry going negative (in case we are holding the sole reference to it). If we can do that, RCU'd dcache lookups won't run into inode reuse at all. Sure, right now d_delete() comes last *and* we want the target inode to stay around past the call of ->unlink(). But if you look at do_unlinkat() you'll see an interesting-looking wart with ihold/iput around vfs_unlink(). Not sure if you remember the story on that one; basically, it's "we'd rather have possible IO on inode freeing to happen outside of the lock on parent". nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to force the "make victim go unhashed, sole reference or not". ecryptfs definitely does that forcing (deliberately so). That area could use some rethinking, and if we can deal with the inode reuse delay while we are at it... Overwriting rename() is also interesting in that respect, of course. I can go and try to RTFS my way through xfs iget-related code, but I'd obviously prefer to do so with at least some overview of that thing from the folks familiar with it. Seeing that it's a lockless search structure, missing something subtle there is all too easy, especially with the lookup-by-fhandle stuff in the mix...