On Wed, 27 Sept 2023 at 11:32, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > put_cred showed up in file_free_rcu in d76b0d9b2d87 ("CRED: Use creds in > file structs"). Commit message does not claim any dependency on this > being in an rcu callback already and it looks like it was done this way > because this was the ony spot with kmem_cache_free(filp_cachep, f) Yes, that looks about right. So the rcu-freeing is almost an accident. Btw, I think we could get rid of the RCU freeing of 'struct file *' entirely. The way to fix it is (a) make sure all f_count accesses are atomic ops (the one special case is the "0 -> X" initialization, which is ok) (b) make filp_cachep be SLAB_TYPESAFE_BY_RCU because then get_file_rcu() can do the atomic_long_inc_not_zero() knowing it's still a 'struct file *' while holding the RCU read lock even if it was just free'd. And __fget_files_rcu() will then re-check that it's the *right* 'struct file *' and do a fput() on it and re-try if it isn't. End result: no need for any RCU freeing. But the difference is that a *new* 'struct file *' might see a temporary atomic increment / decrement of the file pointer because another CPU is going through that __fget_files_rcu() dance. Which is why "0 -> X" is ok to do as a "atomic_long_set()", but everything else would need to be done as "atomic_long_inc()" etc. Which all seems to be the case already, so with the put_cred() not needing the RCU delay, I thing we really could do this patch (note: independent of other issues, but makes your patch require that "atomic_long_cmpxchg()" and the WARN_ON() should probably go away, because it can actually happen). That should help the normal file open/close case a bit, in that it doesn't cause that extra RCU work. Of course, on some loads it might be advantageous to do a delayed de-allocation in some other RCU context, so .. What do you think? Linus PS. And as always: ENTIRELY UNTESTED.
fs/file_table.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index ee21b3da9d08..7b38ff7385cc 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -65,21 +65,21 @@ static void file_free_rcu(struct rcu_head *head) { struct file *f = container_of(head, struct file, f_rcuhead); - put_cred(f->f_cred); - if (unlikely(f->f_mode & FMODE_BACKING)) - kfree(backing_file(f)); - else - kmem_cache_free(filp_cachep, f); + kfree(backing_file(f)); } static inline void file_free(struct file *f) { security_file_free(f); - if (unlikely(f->f_mode & FMODE_BACKING)) - path_put(backing_file_real_path(f)); if (likely(!(f->f_mode & FMODE_NOACCOUNT))) percpu_counter_dec(&nr_files); - call_rcu(&f->f_rcuhead, file_free_rcu); + put_cred(f->f_cred); + if (unlikely(f->f_mode & FMODE_BACKING)) { + path_put(backing_file_real_path(f)); + call_rcu(&f->f_rcuhead, file_free_rcu); + } else { + kmem_cache_free(filp_cachep, f); + } } /* @@ -471,7 +471,8 @@ EXPORT_SYMBOL(__fput_sync); void __init files_init(void) { filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, - SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); + SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN + | SLAB_PANIC | SLAB_ACCOUNT, NULL); percpu_counter_init(&nr_files, 0, GFP_KERNEL); }