Re: tmpfs hang after v6.12-rc6

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 21 Nov 2024 09:06:50 -0500

On Tue, 2024-11-19 at 23:46 +0900, Jeongjun Park wrote:
> Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > On Sat, 2024-11-16 at 10:07 -0500, Chuck Lever wrote:
> > > On Fri, Nov 15, 2024 at 05:31:38PM -0800, Hugh Dickins wrote:
> > > > On Fri, 15 Nov 2024, Chuck Lever wrote:
> > > > > 
> > > > > As I said before, I've failed to find any file system getattr method
> > > > > that explicitly takes the inode's semaphore around a
> > > > > generic_fillattr() call. My understanding is that these methods
> > > > > assume that their caller handles appropriate serialization.
> > > > > Therefore, taking the inode semaphore at all in shmem_getattr()
> > > > > seems to me to be the wrong approach entirely.
> > > > > 
> > > > > The point of reverting immediately is that any fix can't possibly
> > > > > get the review and testing it deserves three days before v6.12
> > > > > becomes final. Also, it's not clear what the rush to fix the
> > > > > KCSAN splat is; according to the Fixes: tag, this issue has been
> > > > > present for years without causing overt problems. It doesn't feel
> > > > > like this change belongs in an -rc in the first place.
> > > > > 
> > > > > Please revert d949d1d14fa2, then let's discuss a proper fix in a
> > > > > separate thread. Thanks!
> > > > 
> > > > Thanks so much for reporting this issue, Chuck: just in time.
> > > > 
> > > > I agree abso-lutely with you: I was just preparing a revert,
> > > > when I saw that akpm is already on it: great, thanks Andrew.
> > > 
> > > Thanks to you both for jumping on this!
> > > 
> > > 
> > > > I was not very keen to see that locking added, just to silence a syzbot
> > > > sanitizer splat: added where there has never been any practical problem
> > > > (and the result of any stat immediately stale anyway).  I was hoping we
> > > > might get a performance regression reported, but a hang serves better!
> > > > 
> > > > If there's a "data_race"-like annotation that can be added to silence
> > > > the sanitizer, okay.  But more locking?  I don't think so.
> > > 
> > > IMHO the KCSAN splat suggests there is a missing inode_lock/unlock
> > > pair /somewhere/. Just not in shmem_getattr().
> > > 
> > > Earlier in this thread, Jeongjun reported:
> > > > ... I found that this data-race mainly occurs when vfs_statx_path
> > > > does not acquire the inode_lock ...
> > > 
> > > That sounds to me like a better place to address this; or at least
> > > that is a good place to start looking for the problem.
> > > 
> > 
> > I don't think this data race is anything to worry about:
> > 
> >     ==================================================================
> >     BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current
> > 
> >     write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
> >      inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
> >      inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
> >      shmem_mknod+0x117/0x180 mm/shmem.c:3443
> >      shmem_create+0x34/0x40 mm/shmem.c:3497
> >      lookup_open fs/namei.c:3578 [inline]
> >      open_last_lookups fs/namei.c:3647 [inline]
> >      path_openat+0xdbc/0x1f00 fs/namei.c:3883
> > 
> >     write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
> >      inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
> >      inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
> >      shmem_mknod+0x117/0x180 mm/shmem.c:3443
> >      shmem_create+0x34/0x40 mm/shmem.c:3497
> >      lookup_open fs/namei.c:3578 [inline]
> >      open_last_lookups fs/namei.c:3647 [inline]
> >      path_openat+0xdbc/0x1f00 fs/namei.c:3883
> >      do_filp_open+0xf7/0x200 fs/namei.c:3913
> >      do_sys_openat2+0xab/0x120 fs/open.c:1416
> >      do_sys_open fs/open.c:1431 [inline]
> >      __do_sys_openat fs/open.c:1447 [inline]
> >      __se_sys_openat fs/open.c:1442 [inline]
> >      __x64_sys_openat+0xf3/0x120 fs/open.c:1442
> >      x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
> >      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >      do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> >      entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> >     read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
> >      inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
> >      inode_get_ctime include/linux/fs.h:1629 [inline]
> >      generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
> >      shmem_getattr+0x17b/0x200 mm/shmem.c:1157
> >      vfs_getattr_nosec fs/stat.c:166 [inline]
> >      vfs_getattr+0x19b/0x1e0 fs/stat.c:207
> >      vfs_statx_path fs/stat.c:251 [inline]
> >      vfs_statx+0x134/0x2f0 fs/stat.c:315
> >      vfs_fstatat+0xec/0x110 fs/stat.c:341
> >      __do_sys_newfstatat fs/stat.c:505 [inline]
> >      __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
> >      __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
> >      x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
> >      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >      do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> >      entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> >     value changed: 0x2755ae53 -> 0x27ee44d3
> > 
> >     Reported by Kernel Concurrency Sanitizer on:
> >     CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0
> >     Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
> >     ==================================================================
> > 
> > inode_set_ctime_to_ts() is just setting the ctime fields in the inode
> > to the timespec64. inode_get_ctime_nsec() is just fetching the ctime
> > nsec field.
> > 
> > We've never tried to synchronize readers and writers when it comes to
> > timestamps. It has always been possible to read an inconsistent
> > timestamp from the inode. The sec and nsec fields are in different
> > words, and there is no synchronization when updating the fields.
> > 
> > Timestamp fetches seem like the right place to use a seqcount loop, but
> > I don't think we would want to add any sort of locking to the update
> > side of things. Maybe that's doable?
> 
> I agree with this opinion to some extent, but I also observe the following
> data-race:
> 
> ==================================================================
> BUG: KCSAN: data-race in shmem_getattr / shmem_recalc_inode
> 
> read-write to 0xffff88811a219d20 of 8 bytes by task 12342 on cpu 1:
>  shmem_recalc_inode+0x36/0x1b0 mm/shmem.c:437
>  shmem_alloc_and_add_folio mm/shmem.c:1866 [inline]
>  shmem_get_folio_gfp+0x7ce/0xd90 mm/shmem.c:2330
>  shmem_get_folio mm/shmem.c:2436 [inline]
>  shmem_write_begin+0xa2/0x180 mm/shmem.c:3038
>  generic_perform_write+0x1a8/0x4a0 mm/filemap.c:4054
>  shmem_file_write_iter+0xc2/0xe0 mm/shmem.c:3213
>  __kernel_write_iter+0x24b/0x4e0 fs/read_write.c:616
>  dump_emit_page fs/coredump.c:884 [inline]
>  dump_user_range+0x3a7/0x550 fs/coredump.c:945
>  elf_core_dump+0x1b66/0x1c60 fs/binfmt_elf.c:2121
>  do_coredump+0x1736/0x1ce0 fs/coredump.c:758
>  get_signal+0xdc0/0x1070 kernel/signal.c:2903
>  arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>  exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>  irqentry_exit_to_user_mode+0x9a/0x130 kernel/entry/common.c:231
>  irqentry_exit+0x12/0x50 kernel/entry/common.c:334
>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> 
> read to 0xffff88811a219d20 of 8 bytes by task 10055 on cpu 0:
>  shmem_getattr+0x42/0x200 mm/shmem.c:1157
>  vfs_getattr_nosec fs/stat.c:166 [inline]
>  vfs_getattr+0x19b/0x1e0 fs/stat.c:207
>  vfs_statx_path fs/stat.c:251 [inline]
>  vfs_statx+0x134/0x2f0 fs/stat.c:315
>  vfs_fstatat+0xec/0x110 fs/stat.c:341
>  __do_sys_newfstatat fs/stat.c:505 [inline]
>  __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
>  __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
>  x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> value changed: 0x0000000000001932 -> 0x0000000000001934
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 UID: 0 PID: 10055 Comm: syz-executor Not tainted 6.12.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 10/30/2024
> ==================================================================
> 
> This can lead to unexpected changes in execution path, so I'm considering
> adding a lock, but I'm not sure how dangerous this actually is, so I'm also
> considering commenting out the data-race.
> 
> 

Do you know what fields those are?

My guess would be either info->alloced or info->swapped. Those are
written under the info->lock in shmem_recalc_inode, but that lock isn't
held in shmem_getattr. Those races look pretty harmless to me, but
maybe I'm missing something too.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>