On Mon, Mar 25, 2019 at 11:36 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > You mean, split ->destroy_inode() into immediate and RCU-delayed parts? > > There are filesystems where both parts are non-empty - we can't just > > switch all ->destroy_inode() work to call_rcu(). > > Right. Not just move the existing destroy_inode() - because as you > say, people may not be able to to do that in RCU contect, but split it > up, and add a "final_free_inode()" callback or something for the RCU > phase. Something like the attached. COMPLETELY UNTESTED. And no filesystems converted to actually use the new rcu_destroy_inode() thing. Hmm? Linus
Documentation/filesystems/vfs.txt | 6 ++++++ fs/inode.c | 27 +++++++++++++++++++++------ include/linux/fs.h | 1 + 3 files changed, 28 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 761c6fd24a53..60f7841a12e6 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -210,6 +210,7 @@ filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); + int (*rcu_destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, int); @@ -248,6 +249,11 @@ or bottom half). ->alloc_inode was defined and simply undoes anything done by ->alloc_inode. + rcu_destroy_inode: this method is called after the RCU delay by + destroy_inode() to release resources allocated for struct inode. + If it returns a non-zero value, it means that it has free'd the + inode, otherwise the inode layer will free it. + dirty_inode: this method is called by the VFS to mark an inode dirty. write_inode: this method is called when the VFS needs to write an diff --git a/fs/inode.c b/fs/inode.c index e9d97add2b36..b85457baad20 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -215,9 +215,13 @@ static struct inode *alloc_inode(struct super_block *sb) return NULL; if (unlikely(inode_init_always(sb, inode))) { - if (inode->i_sb->s_op->destroy_inode) + if (inode->i_sb->s_op->destroy_inode) { inode->i_sb->s_op->destroy_inode(inode); - else + if (!inode->i_sb->s_op->rcu_destroy_inode) + return NULL; + } + if (!inode->i_sb->s_op->rcu_destroy_inode || + !inode->i_sb->s_op->rcu_destroy_inode(inode)) kmem_cache_free(inode_cachep, inode); return NULL; } @@ -256,17 +260,28 @@ EXPORT_SYMBOL(__destroy_inode); static void i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); - kmem_cache_free(inode_cachep, inode); + + if (!inode->i_sb->s_op->rcu_destroy_inode || + !inode->i_sb->s_op->rcu_destroy_inode(inode)) + kmem_cache_free(inode_cachep, inode); } static void destroy_inode(struct inode *inode) { BUG_ON(!list_empty(&inode->i_lru)); __destroy_inode(inode); - if (inode->i_sb->s_op->destroy_inode) + + /* + * If we have a 'destroy_inode' but no 'rcu_destroy_inode' + * then the filesystem handles the RCU-delayed destruction + * on its own, and we don't do any RCU callbacks. + */ + if (inode->i_sb->s_op->destroy_inode) { inode->i_sb->s_op->destroy_inode(inode); - else - call_rcu(&inode->i_rcu, i_callback); + if (!inode->i_sb->s_op->rcu_destroy_inode) + return; + } + call_rcu(&inode->i_rcu, i_callback); } /** diff --git a/include/linux/fs.h b/include/linux/fs.h index 8b42df09b04c..727561ecbc23 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1900,6 +1900,7 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); + int (*rcu_destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, struct writeback_control *wbc);