On Mon, Oct 7, 2024 at 4:23 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > Port files to rely on file_ref reference to improve scaling and gain > overflow protection. [...] > diff --git a/fs/file_table.c b/fs/file_table.c > index 4b23eb7b79dd9d4ec779f4c01ba2e902988895dc..3f5dc4176b21ff82cc9440ed92a0ad962fdb2046 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -178,7 +178,7 @@ static int init_file(struct file *f, int flags, const struct cred *cred) > * fget-rcu pattern users need to be able to handle spurious > * refcount bumps we should reinitialize the reused file first. > */ > - atomic_long_set(&f->f_count, 1); > + file_ref_init(&f->f_ref, 1); It is good that you use file_ref_init() here to atomically initialize the file_ref; however, I think it is problematic that before this, alloc_empty_file() uses kmem_cache_zalloc(filp_cachep, GFP_KERNEL) to allocate the file, because that sets __GFP_ZERO, which means that slab_post_alloc_hook() will use memset() to zero the file object. That causes trouble in two different ways: 1. After the memset() has changed the file ref to zero, I think file_ref_get() can return true? Which means __get_file_rcu() could believe that it acquired a reference, and we could race like this: task A task B __get_file_rcu() rcu_dereference_raw() close() [frees file] alloc_empty_file() kmem_cache_zalloc() [reallocates same file] memset(..., 0, ...) file_ref_get() [increments 0->1, returns true] init_file() file_ref_init(..., 1) [sets to 0] rcu_dereference_raw() fput() file_ref_put() [decrements 0->FILE_REF_NOREF, frees file] [UAF] 2. AFAIK the memset() is not guaranteed to atomically update an "unsigned long", so you could see an entirely bogus torn counter value. The only reason this worked in the old code is that the refcount value stored in freed files is 0. So I think you need to stop using kmem_cache_zalloc() to allocate files, and instead use a constructor function that zeroes the refcount field, and manually memset() the rest of the "struct file" to 0 after calling kmem_cache_alloc(). > return 0; > } >