Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 16, 2019 at 11:01:16AM -0700, Linus Torvalds wrote:
> On Tue, Apr 16, 2019 at 10:49 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >  83 files changed, 241 insertions(+), 516 deletions(-)
> 
> I think this single line is pretty convincing on its own. Ignoring
> docs and fs/inode.c, we have
> 
>  80 files changed, 190 insertions(+), 494 deletions(-)
> 
> IOW, just over 300 lines of boiler plate code removed.
> 
> The additions are
> 
>  - Ten more lines of actual code in fs/inode.c (and that's not
> actually added complexity, it looks simpler if anything - most of it
> is the new "i_callback()" helper function)
> 
>  - 19 lines of doc updates.
> 
> So it absolutely looks fine to me.
> 
> I only skimmed through the actual filesystem (and one networking)
> patches, but they looked like trivial conversions to a better
> interface.

... except that this callback can (and always could) get executed after
freeing struct super_block.  So we can't just dereference ->i_sb->s_op
and expect to survive; the table ->s_op pointed to will still be there,
but ->i_sb might very well have been freed, with all its contents overwritten.
We need to copy the callback into struct inode itself, unfortunately.
The following incremental fixes it; I'm going to fold it into the first
commit in there.

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
 		* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
 		* combination of NULL ->destroy_inode and NULL ->free_inode is
 		  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
+
+	Note that the callback (be it via ->free_inode() or explicit call_rcu()
+	in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+	as the matter of fact, the superblock and all associated structures
+	might be already gone.  The filesystem driver is guaranteed to be still
+	there, but that's it.  Freeing memory in the callback is fine; doing
+	more than that is possible, but requires a lot of care and is best
+	avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..855dad43b11d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -164,6 +164,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_wb_frn_avg_time = 0;
 	inode->i_wb_frn_history = 0;
 #endif
+	inode->free_inode = sb->s_op->free_inode;
 
 	if (security_inode_alloc(inode))
 		goto out;
@@ -211,8 +212,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
 static void i_callback(struct rcu_head *head)
 {
 	struct inode *inode = container_of(head, struct inode, i_rcu);
-	if (inode->i_sb->s_op->free_inode)
-		inode->i_sb->s_op->free_inode(inode);
+	if (inode->free_inode)
+		inode->free_inode(inode);
 	else
 		free_inode_nonrcu(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..5ed6b39e588e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -718,6 +718,7 @@ struct inode {
 #endif
 
 	void			*i_private; /* fs or device private pointer */
+	void (*free_inode)(struct inode *);
 } __randomize_layout;
 
 static inline unsigned int i_blocksize(const struct inode *node)



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux