Re: [PATCH v2] vfs: shave work on failed file open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 27 Sept 2023 at 11:32, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> put_cred showed up in file_free_rcu in d76b0d9b2d87 ("CRED: Use creds in
> file structs"). Commit message does not claim any dependency on this
> being in an rcu callback already and it looks like it was done this way
> because this was the ony spot with kmem_cache_free(filp_cachep, f)

Yes, that looks about right. So the rcu-freeing is almost an accident.

Btw, I think we could get rid of the RCU freeing of 'struct file *' entirely.

The way to fix it is

 (a) make sure all f_count accesses are atomic ops (the one special
case is the "0 -> X" initialization, which is ok)

 (b) make filp_cachep be SLAB_TYPESAFE_BY_RCU

because then get_file_rcu() can do the atomic_long_inc_not_zero()
knowing it's still a 'struct file *' while holding the RCU read lock
even if it was just free'd.

And __fget_files_rcu() will then re-check that it's the *right*
'struct file *' and do a fput() on it and re-try if it isn't. End
result: no need for any RCU freeing.

But the difference is that a *new* 'struct file *' might see a
temporary atomic increment / decrement of the file pointer because
another CPU is going through that __fget_files_rcu() dance.

Which is why "0 -> X" is ok to do as a "atomic_long_set()", but
everything else would need to be done as "atomic_long_inc()" etc.

Which all seems to be the case already, so with the put_cred() not
needing the RCU delay, I thing we really could do this patch (note:
independent of other issues, but makes your patch require that
"atomic_long_cmpxchg()" and the WARN_ON() should probably go away,
because it can actually happen).

That should help the normal file open/close case a bit, in that it
doesn't cause that extra RCU work.

Of course, on some loads it might be advantageous to do a delayed
de-allocation in some other RCU context, so ..

What do you think?

             Linus

PS. And as always: ENTIRELY UNTESTED.
 fs/file_table.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ee21b3da9d08..7b38ff7385cc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -65,21 +65,21 @@ static void file_free_rcu(struct rcu_head *head)
 {
 	struct file *f = container_of(head, struct file, f_rcuhead);
 
-	put_cred(f->f_cred);
-	if (unlikely(f->f_mode & FMODE_BACKING))
-		kfree(backing_file(f));
-	else
-		kmem_cache_free(filp_cachep, f);
+	kfree(backing_file(f));
 }
 
 static inline void file_free(struct file *f)
 {
 	security_file_free(f);
-	if (unlikely(f->f_mode & FMODE_BACKING))
-		path_put(backing_file_real_path(f));
 	if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
 		percpu_counter_dec(&nr_files);
-	call_rcu(&f->f_rcuhead, file_free_rcu);
+	put_cred(f->f_cred);
+	if (unlikely(f->f_mode & FMODE_BACKING)) {
+		path_put(backing_file_real_path(f));
+		call_rcu(&f->f_rcuhead, file_free_rcu);
+	} else {
+		kmem_cache_free(filp_cachep, f);
+	}
 }
 
 /*
@@ -471,7 +471,8 @@ EXPORT_SYMBOL(__fput_sync);
 void __init files_init(void)
 {
 	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
-			SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
+			SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN
+			| SLAB_PANIC | SLAB_ACCOUNT, NULL);
 	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
 }
 

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux