On Thu, Apr 4, 2024 at 2:51 AM syzbot <syzbot+9a5b0ced8b1bfb238b56@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > syzbot has bisected this issue to: > > commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be > Author: Valentine Sinitsyn <valesini@xxxxxxxxxxxxxx> > Date: Mon Sep 25 08:40:12 2023 +0000 > > kernfs: sysfs: support custom llseek method for sysfs entries > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000 > start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel.. > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000 > console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000 > kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000 > > Reported-by: syzbot+9a5b0ced8b1bfb238b56@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I think this commit is only the trigger for lockdep warning in this specific scenario, but the conceptual issue existed before that for example, with read from sysfs, which also can take of->mutex. I think (not sure) that the potential deadlock is real, not a false positive. OTOH, hibernation code may be crawling with potential and more likely deadlocks... The conceptual issue (I think) is this: Overlayfs is a stacked filesystem which regularly calls vfs helpers such as path lookup on other filesystems. This specialized behavior is accompanied with a declaration of s_stack_depth > 0, annotating ovl inode locks per stack depth (ovl_lockdep_annotate_inode_mutex_key) and restricting the types of filesystems that are allowed for writable upper layer. In the lockdep dependency chain, overlayfs inode lock is taken before kernfs internal of->mutex, where kernfs (sysfs) is the lower layer of overlayfs, which is sane. With /sys/power/resume (and probably other files), sysfs also behaves as a stacking filesystem, calling vfs helpers, such as lookup_bdev() -> kern_path(), which is a behavior of a stacked filesystem, without all the precautions that comes with behaving as a stacked filesystem. If an overlayfs path is written into /sys/power/resume and that overlayfs has sysfs as its lower layer, then there may be a small chance of hitting the ABBA deadlock, but there could very well be some conditions that prevent it. It's a shame that converting blockdev path to major:minor is not always done as a separate step by usersapce, but not much to do about it now... I don't think that Christoph's refactoring to blockdev interfaces have changed anything in this regard, but I wasn't sure so CCed the developers involved. A relatively easy way to avert this use case is to define a lookup flag LOOKUP_NO_STACKED, which does not traverse into any stacked fs (s_stack_depth > 0) and use this flag for looking up the hibernation blockdev. I suppose writing the hibernation blockdev from inside containers is not a common thing to do, so this mitigation could be good enough. Thoughts? Amir.