Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Tue, 30 Apr 2019 05:00:43 +0100

On Mon, Apr 29, 2019 at 08:37:29PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019, 20:09 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> 
> >
> > ... except that this callback can (and always could) get executed after
> > freeing struct super_block.
> >
> 
> Ugh.
> 
> That food looks nasty. Shouldn't the super block freeing wait for the
> filesystem to be all done instead? Do a rcu synchronization or something?
> 
> Adding that pointer looks really wrong to me. I'd much rather delay the sb
> freeing. Is there some reason that can't be done that I'm missing?

Where would you put that synchronize_rcu()?  Doing that before ->put_super()
is too early - inode references might be dropped in there.  OTOH, doing
that after that point means that while struct super_block itself will be
there, any number of data structures hanging from it might be not.

So we are still very limited in what we can do inside ->free_inode()
instance *and* we get bunch of synchronize_rcu() for no good reason.

Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(),
etc.) we are just fine with having struct super_block freeing RCU-delayed
(along with any data structures we might need) - the superblock had
been seen at some point after we'd taken rcu_read_lock(), so its
freeing won't happen until we drop it.  So we don't need synchronize_rcu()
for that.

Here the problem is that we are dealing with another RCU callback;
synchronize_rcu() would be needed for it, but it will only protect that
intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled
from inside ->put_super() would not be ordered wrt ->free_inode().
And if we are doing that just for the sake of that one dereference,
we might as well do it before scheduling i_callback().

PS: we *are* guaranteed that module will still be there (unregister_filesystem()
does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy()
in assorted exit_foo_fs()).