On Wed, Oct 25, 2023 at 08:52:57AM -1000, Linus Torvalds wrote: > On Wed, 25 Oct 2023 at 02:01, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > rcu_read_lock(); > > - file = get_file_rcu(&i915->gem.mmap_singleton); > > + file = get_file_rcu_once(&i915->gem.mmap_singleton); > > rcu_read_unlock(); > > if (file) > > return file; > > Honestly, the get_file_rcu_once() function seems a bit pointless. > > The above doesn't want a loop at all. Yet your "once" still does loop, > because "even get_file_rcu_once() is trying to deal with the whole > "the pointer itself changed". > > And the i915 code is actually designed to just depend on the atomicity > of the mmap_singleton access itself, in how it uses > > cmpxchg(&i915->gem.mmap_singleton, file, NULL); > ... > file = READ_ONCE(i915->gem.mmap_singleton); > > and literally says "I'll remove my singleton as it is released". The > important part there is that the 'map_singleton' pointer itself isn't > actually reference-counted - it's the reverse, where > reference-counting *other* instances will then auto-clear it. > > And that's what then makes that get_file_rcu() model not work with it, > because get_file_rcu() kind of assumes that the argument it gets is > *part* of the reference counting, not a cached *result* of the > reference counting that gets cleared as a result of the ref going down > to zero. > > I may explain my objections badly, but hopefully you get what I mean. No, I get it. I was on the fence how to deal with this because I honestly find this model here extremely weird and very unintuitive to begin with with the pointer being NULLed during release and replaced that way. > > And I think that also means that using that new get_file_rcu_once() > may be hiding the actual problem, but it's still conceptually wrong, > because it still has that conceptual model of "the pointer I'm getting > is part of the reference counting", when it really isn't. > > So I think we'd actually be better off with something that is more > clearly different from get_file_rcu() in naming, so make that > conceptual difference clearer. Make it be something like > "get_active_file(struct file **)", and make the implementation be just > exactly what your current __get_file_rcu() is with no loops at all. > > Then thew i915 code ends up being > > rcu_read_lock(); > file = get_active_file(&i915->gem.mmap_singleton); > rcu_read_unlock(); > > if (!IS_ERR_OR_NULL(file)) > return file; > > .. create new mmap_singleton .. > > and that's it. > > If you don't want to expose __get_file_rcu() as-is, you could maybe just do > > struct file *get_active_file(struct file **fp) > { > struct file *file; > rcu_read_lock(); > file = __get_file_rcu(fp); > rcu_read_unlock(); > return file; > } > > and then the i916 code would just drop the RCU locking that it has no > business even knowing about. Yeah, fair idea. I just want this fixed so we don't have to drop. Pushed to vfs.misc with your suggested changes for get_file_active() with rcu hidden in that function. Double-check and yell if something looks wrong: > The old i915 code is already racy, in that it's called a "singleton", > but if you have multiple concurrent callers to mmap_singleton(), they > all might see a NULL file at first, and then they all create > *separate* new "singleton" files, and they *all* do that > > smp_store_mb(i915->gem.mmap_singleton, file); > > and one random case of them happens to win the race and set *its* file > as "THE singleton" file. Yeah, I think that's all really weird but whatever. > Am I missing something? No, I don't think so. I thought you might have a good opinion here. I did think that the loop didn't really matter for this case but since it seemingly does paper over the weird semantics here let's drop it. Anyway, pushed to: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.misc and appended here. Please yell, if something's still off.
>From 61d4fb0b349ec1b33119913c3b0bd109de30142c Mon Sep 17 00:00:00 2001 From: Christian Brauner <brauner@xxxxxxxxxx> Date: Wed, 25 Oct 2023 12:14:37 +0200 Subject: [PATCH] file, i915: fix file reference for mmap_singleton() Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below. So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics. The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release(): static int singleton_release(struct inode *inode, struct file *file) { struct drm_i915_private *i915 = file->private_data; cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm); return 0; } The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton: While mmap_singleton() does rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); it allocates a new file via anon_inode_getfile() and does smp_store_mb(i915->gem.mmap_singleton, file); So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact. Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL. Say concurrently another task hits mmap_singleton() and does: rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu(). But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called. So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload. <3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK> Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@xxxxxxxxx#dmesg-warnings10963 Cc: Jann Horn <jannh@xxxxxxxxxx>, Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 +--- fs/file.c | 25 ++++++++++++++++++++++++ include/linux/fs.h | 1 + 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 97bf10861cad..72b353737334 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -915,9 +915,7 @@ static struct file *mmap_singleton(struct drm_i915_private *i915) { struct file *file; - rcu_read_lock(); - file = get_file_rcu(&i915->gem.mmap_singleton); - rcu_read_unlock(); + file = get_file_active(&i915->gem.mmap_singleton); if (file) return file; diff --git a/fs/file.c b/fs/file.c index 1a475d7d636e..5fb0b146e79e 100644 --- a/fs/file.c +++ b/fs/file.c @@ -927,6 +927,31 @@ struct file *get_file_rcu(struct file __rcu **f) } EXPORT_SYMBOL_GPL(get_file_rcu); +/** + * get_file_active - try go get a reference to a file + * @f: the file to get a reference on + * + * In contast to get_file_rcu() the pointer itself isn't part of the + * reference counting. + * + * This function should rarely have to be used and only by users who + * understand the implications of SLAB_TYPESAFE_BY_RCU. Try to avoid it. + * + * Return: Returns @f with the reference count increased or NULL. + */ +struct file *get_file_active(struct file **f) +{ + struct file __rcu *file; + + rcu_read_lock(); + file = __get_file_rcu(f); + rcu_read_unlock(); + if (IS_ERR(file)) + file = NULL; + return file; +} +EXPORT_SYMBOL_GPL(get_file_active); + static inline struct file *__fget_files_rcu(struct files_struct *files, unsigned int fd, fmode_t mask) { diff --git a/include/linux/fs.h b/include/linux/fs.h index cb8bfa1f8ecb..b35815277cc6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1044,6 +1044,7 @@ static inline struct file *get_file(struct file *f) } struct file *get_file_rcu(struct file __rcu **f); +struct file *get_file_active(struct file **f); #define file_count(x) atomic_long_read(&(x)->f_count) -- 2.34.1