On Fri, 10 May 2024 07:26:13 +0800 Hillf Danton <hdanton@xxxxxxxx> wrote: > On Thu, 9 May 2024 17:52:21 +0300 Amir Goldstein <amir73il@xxxxxxxxx> > > On Thu, May 9, 2024 at 1:49 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > > > The correct locking order is > > > > > > sb_writers > > > > This is sb of overlayfs > > > > > inode lock > > > > This is real inode > > > WRT sb_writers the order > > lock inode parent > lock inode kid > > becomes > lock inode kid > sb_writers > lock inode parent > > given call trace > > > -> #2 (sb_writers#4){.+.+}-{0:0}: > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > percpu_down_read include/linux/percpu-rwsem.h:51 [inline] > > __sb_start_write include/linux/fs.h:1664 [inline] > > sb_start_write+0x4d/0x1c0 include/linux/fs.h:1800 > > mnt_want_write+0x3f/0x90 fs/namespace.c:409 > > ovl_create_object+0x13b/0x370 fs/overlayfs/dir.c:629 > > lookup_open fs/namei.c:3497 [inline] > > open_last_lookups fs/namei.c:3566 [inline] > > and code snippet [1] > > if (open_flag & O_CREAT) > inode_lock(dir->d_inode); > else > inode_lock_shared(dir->d_inode); > dentry = lookup_open(nd, file, op, got_write); > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c?id=dccb07f2914c#n3566 JFYI simply cutting off mnt_want_write() in ovl_create_object() survived the syzpot repro [2], so acquiring sb_writers with inode locked at least in the lookup path makes trouble. [2] https://lore.kernel.org/lkml/000000000000975906061817416b@xxxxxxxxxx/