On Mon, Apr 29, 2019 at 08:37:29PM -0700, Linus Torvalds wrote: > On Mon, Apr 29, 2019, 20:09 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > ... except that this callback can (and always could) get executed after > > freeing struct super_block. > > > > Ugh. > > That food looks nasty. Shouldn't the super block freeing wait for the > filesystem to be all done instead? Do a rcu synchronization or something? > > Adding that pointer looks really wrong to me. I'd much rather delay the sb > freeing. Is there some reason that can't be done that I'm missing? Where would you put that synchronize_rcu()? Doing that before ->put_super() is too early - inode references might be dropped in there. OTOH, doing that after that point means that while struct super_block itself will be there, any number of data structures hanging from it might be not. So we are still very limited in what we can do inside ->free_inode() instance *and* we get bunch of synchronize_rcu() for no good reason. Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(), etc.) we are just fine with having struct super_block freeing RCU-delayed (along with any data structures we might need) - the superblock had been seen at some point after we'd taken rcu_read_lock(), so its freeing won't happen until we drop it. So we don't need synchronize_rcu() for that. Here the problem is that we are dealing with another RCU callback; synchronize_rcu() would be needed for it, but it will only protect that intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled from inside ->put_super() would not be ordered wrt ->free_inode(). And if we are doing that just for the sake of that one dereference, we might as well do it before scheduling i_callback(). PS: we *are* guaranteed that module will still be there (unregister_filesystem() does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy() in assorted exit_foo_fs()).