On Mon, Mar 25, 2019 at 2:14 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > Maybe, but we really need to come up with sane documentation on the > entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode > group ;-/ Yeah. I actually think the "destroy_inode/rcu_destroy_inode" part is the simplest one to understand: it's just tearing down the inode, and the RCU part is (obviously, as shown by this thread) somewhat subtle, but at the same time not really all that hard to explain: "inodes that can be looked up through RCU lookup need to be destroyed with RCU delay". I think drop_inode->evict_inode is arguably the much more subtle stuff. That's the "we're not destroying things, we're making decisions about the state" kind of thing. And we actually don't have any documentation at all for those two. Well, there's the "porting" thing, but.. > And I want to understand the writeback-related issues > in ocfs2 and f2fs - the current kludges in those smell fishy. I'm assuming you're talking about exactly that drop_inode() kind of subtlety. At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out would simplify things a lot. *All* that the ocfs2_destroy_inode() function does is to schedule the inode freeing for RCU delay. Assuming my patch is fine (as mentioned, it was entirely untested), that patch would actually simplify that a lot. Get rid of ocfs2_destroy_inode() entirely, and just make ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up (we have that ocfs2_rcu_destroy_inode() currently as ocfs2_i_callback(), but with the ugly rcu-head interface). > > if (unlikely(inode_init_always(sb, inode))) { > > - if (inode->i_sb->s_op->destroy_inode) > > + if (inode->i_sb->s_op->destroy_inode) { > > inode->i_sb->s_op->destroy_inode(inode); > > - else > > + if (!inode->i_sb->s_op->rcu_destroy_inode) > > + return NULL; > > + } > > + if (!inode->i_sb->s_op->rcu_destroy_inode || > > + !inode->i_sb->s_op->rcu_destroy_inode(inode)) > > kmem_cache_free(inode_cachep, inode); > > ITYM i_callback(inode); here, possibly suitably renamed. Yeah, I guess we could just call that i_callback() directly. I wrote it that way because it was how the code was organized (it didn't call i_callback() before either), but yes, it woudl avoid some duplicate code to do it that way. And yes, at that point we'd probably want to rename it too. On the whole, looking at a few filesystems, I really think that 'rcu_destroy_inode()" would simplify several of them. That ocfs2 case I already mentioned, but looking around the exact same is trye in v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs, openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs, gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs) And in other filesystems that actually *do* have a real destroy_inode() that does other things too, they still want the rcu delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs, ext4, jfs, organgefs, ovl, ubifs). In fact, every single filesystem I looked at (and I looked at most) would be simplified by that patch. And for some it would make it easier to fix bugs in the process (ie bpf) because they don't currently have that rcu delay and they need it. So looking at all the existing users of destroy_inode(), I really do think we should have that rcu_destroy_inode() callback. Because that 3 files changed, 28 insertions(+), 6 deletions(-) will allow quite a lot of lines to be removed from low-level filesystems with all the call_rcu() and callback container_of(head, struct inode, i_rcu) noise just going away. But yes, it would be good to have more documentation fo the drop/evict phase. As mentioned, I think the destroy phase is actually the easy one, with the RCU part being the most complex one that this would actually simplify. Linus