On Sun, Mar 24, 2019 at 9:57 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > So we have 5 broken cases, all with the same kind of fix: move freeing > into the RCU-delayed part of ->destroy_inode(); for debugfs and bpf > that requires adding ->alloc_inode()/->destroy_inode(), rather than > relying upon the defaults from fs/inode.c The fact that we had five different broken filesystems does imply that we should just make this much easier at the VFS layer instead of forcing filesystems to cope with a bad interface. > > Or maybe we could add a final inode callback function for "rcu_free" > > that is called as the RCU-delayed freeing of the inode itself happens? > > And then people could hook into that for freeing the inode->i_link > > data. > > You mean, split ->destroy_inode() into immediate and RCU-delayed parts? > There are filesystems where both parts are non-empty - we can't just > switch all ->destroy_inode() work to call_rcu(). Right. Not just move the existing destroy_inode() - because as you say, people may not be able to to do that in RCU contect, but split it up, and add a "final_free_inode()" callback or something for the RCU phase. In fact, I suspect that *every* user of destroy_inode() already has to have its own RCU callback to another final freeing function anyway. Because they really shouldn't free the inode itself early. Maybe we can just make that be a generic thing? And no, it's not like the patch to the bpf filesystem looks _horrible_, but I think the fact that the low-level filesystem needed to do its own RCU-deferred thing (even if it wasn't doing any RCU on its own, although bpf obviously has that in other areas) is kind of wrong, and shows that our vfs interfaces are a bit awkward. Plus it's not like it was just bpf that got this wrong. It's also somewhat silly to add _another_ RCU callback when we already have one for the inode. Linus