On Thu, 9 May 2024 09:37:24 +0300 Amir Goldstein <amir73il@xxxxxxxxx> > On Thu, May 9, 2024 at 2:19 AM Hillf Danton <hdanton@xxxxxxxx> wrote: > > On Tue, 07 May 2024 22:36:18 -0700 > > > syzbot has found a reproducer for the following issue on: > > > > > > HEAD commit: dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern.. > > > git tree: upstream > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=137daa6c980000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=9d7ea7de0cb32587 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4c493dcd5a68168a94b2 > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1134f3c0980000 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1367a504980000 > > > > > > Downloadable assets: > > > disk image: https://storage.googleapis.com/syzbot-assets/ea1961ce01fe/disk-dccb07f2.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/445a00347402/vmlinux-dccb07f2.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/461aed7c4df3/bzImage-dccb07f2.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+4c493dcd5a68168a94b2@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > ====================================================== > > > WARNING: possible circular locking dependency detected > > > 6.9.0-rc7-syzkaller-00012-gdccb07f2914c #0 Not tainted > > > ------------------------------------------------------ > > > syz-executor149/5078 is trying to acquire lock: > > > ffff88802a978888 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x53/0x3b0 fs/kernfs/file.c:154 > > > > > > but task is already holding lock: > > > ffff88802d80b540 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb7/0xd60 fs/seq_file.c:182 > > > > > > which lock already depends on the new lock. > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > -> #4 (&p->lock){+.+.}-{3:3}: > > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > > __mutex_lock_common kernel/locking/mutex.c:608 [inline] > > > __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 > > > seq_read_iter+0xb7/0xd60 fs/seq_file.c:182 > > > call_read_iter include/linux/fs.h:2104 [inline] > > > copy_splice_read+0x662/0xb60 fs/splice.c:365 > > > do_splice_read fs/splice.c:985 [inline] > > > splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 > > > do_sendfile+0x515/0xdc0 fs/read_write.c:1301 > > > __do_sys_sendfile64 fs/read_write.c:1362 [inline] > > > __se_sys_sendfile64+0x17c/0x1e0 fs/read_write.c:1348 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #3 (&pipe->mutex){+.+.}-{3:3}: > > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > > __mutex_lock_common kernel/locking/mutex.c:608 [inline] > > > __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 > > > iter_file_splice_write+0x335/0x14e0 fs/splice.c:687 > > > backing_file_splice_write+0x2bc/0x4c0 fs/backing-file.c:289 > > > ovl_splice_write+0x3cf/0x500 fs/overlayfs/file.c:379 > > > do_splice_from fs/splice.c:941 [inline] > > > do_splice+0xd77/0x1880 fs/splice.c:1354 file_start_write(out); ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); The correct locking order is sb_writers inode lock > > > __do_splice fs/splice.c:1436 [inline] > > > __do_sys_splice fs/splice.c:1652 [inline] > > > __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #2 (sb_writers#4){.+.+}-{0:0}: > > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > > percpu_down_read include/linux/percpu-rwsem.h:51 [inline] > > > __sb_start_write include/linux/fs.h:1664 [inline] > > > sb_start_write+0x4d/0x1c0 include/linux/fs.h:1800 > > > mnt_want_write+0x3f/0x90 fs/namespace.c:409 but inverse order occurs here. > > > ovl_create_object+0x13b/0x370 fs/overlayfs/dir.c:629 > > > lookup_open fs/namei.c:3497 [inline] > > > open_last_lookups fs/namei.c:3566 [inline] > > > path_openat+0x1425/0x3240 fs/namei.c:3796 > > > do_filp_open+0x235/0x490 fs/namei.c:3826 > > > do_sys_openat2+0x13e/0x1d0 fs/open.c:1406 > > > do_sys_open fs/open.c:1421 [inline] > > > __do_sys_open fs/open.c:1429 [inline] > > > __se_sys_open fs/open.c:1425 [inline] > > > __x64_sys_open+0x225/0x270 fs/open.c:1425 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}: > > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > > down_read+0xb1/0xa40 kernel/locking/rwsem.c:1526 > > > inode_lock_shared include/linux/fs.h:805 [inline] > > > lookup_slow+0x45/0x70 fs/namei.c:1708 > > > walk_component+0x2e1/0x410 fs/namei.c:2004 > > > lookup_last fs/namei.c:2461 [inline] > > > path_lookupat+0x16f/0x450 fs/namei.c:2485 > > > filename_lookup+0x256/0x610 fs/namei.c:2514 > > > kern_path+0x35/0x50 fs/namei.c:2622 > > > lookup_bdev+0xc5/0x290 block/bdev.c:1136 > > > resume_store+0x1a0/0x710 kernel/power/hibernate.c:1235 > > > kernfs_fop_write_iter+0x3a1/0x500 fs/kernfs/file.c:334 > > > call_write_iter include/linux/fs.h:2110 [inline] > > > new_sync_write fs/read_write.c:497 [inline] > > > vfs_write+0xa84/0xcb0 fs/read_write.c:590 > > > ksys_write+0x1a0/0x2c0 fs/read_write.c:643 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #0 (&of->mutex){+.+.}-{3:3}: > > > check_prev_add kernel/locking/lockdep.c:3134 [inline] > > > check_prevs_add kernel/locking/lockdep.c:3253 [inline] > > > validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 > > > __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 > > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > > __mutex_lock_common kernel/locking/mutex.c:608 [inline] > > > __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 > > > kernfs_seq_start+0x53/0x3b0 fs/kernfs/file.c:154 > > > traverse+0x14f/0x550 fs/seq_file.c:106 > > > seq_read_iter+0xc5e/0xd60 fs/seq_file.c:195 > > > call_read_iter include/linux/fs.h:2104 [inline] > > > copy_splice_read+0x662/0xb60 fs/splice.c:365 > > > do_splice_read fs/splice.c:985 [inline] > > > splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 > > > do_sendfile+0x515/0xdc0 fs/read_write.c:1301 > > > __do_sys_sendfile64 fs/read_write.c:1362 [inline] > > > __se_sys_sendfile64+0x17c/0x1e0 fs/read_write.c:1348 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > other info that might help us debug this: > > > > > > Chain exists of: > > > &of->mutex --> &pipe->mutex --> &p->lock > > > > > > Possible unsafe locking scenario: > > > > > > CPU0 CPU1 > > > ---- ---- > > > lock(&p->lock); > > > lock(&pipe->mutex); > > > lock(&p->lock); > > > lock(&of->mutex); > > > > > > *** DEADLOCK *** > > > > This shows 16b52bbee482 ("kernfs: annotate different lockdep class for > > of->mutex of writable files") is a bandaid. > > Well, nobody said that it fixes the root cause. > But the annotation fix is correct, because the former report was > really false positive one. > > The root cause is resume_store() doing vfs path lookup. resume_store() looks innocent before locking order above is explained. > If we could deprecate this allegedly unneeded UAPI we should. > > That said, all those lockdep warnings indicate a possible deadlock > if someone tries to hibernate into an overlayfs file. > > If root tries to do that then, this is either an attack or stupidity. > Either Way the news flash from this report is "root may be able > to deadlock kernel on purpose" > Not very exciting and not likely to happen in the real world. > > The remaining question is what to do about the lockdep reports. > > Questions to PM maintainers: > Any chance to deprecate writing path to /sys/power/resume? > Userspace should have no problem getting the same done > with writing dev number. > > Thanks, > Amir.