Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 4, 2024 at 2:51 AM syzbot
<syzbot+9a5b0ced8b1bfb238b56@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> syzbot has bisected this issue to:
>
> commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be
> Author: Valentine Sinitsyn <valesini@xxxxxxxxxxxxxx>
> Date:   Mon Sep 25 08:40:12 2023 +0000
>
>     kernfs: sysfs: support custom llseek method for sysfs entries
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000
> start commit:   fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree:       upstream
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000
> console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
> dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
>
> Reported-by: syzbot+9a5b0ced8b1bfb238b56@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>

I think this commit is only the trigger for lockdep warning in this
specific scenario, but the conceptual issue existed before that
for example, with read from sysfs, which also can take of->mutex.

I think (not sure) that the potential deadlock is real, not a false
positive. OTOH, hibernation code may be crawling with potential
and more likely deadlocks...

The conceptual issue (I think) is this:

Overlayfs is a stacked filesystem which regularly calls vfs helpers
such as path lookup on other filesystems.
This specialized behavior is accompanied with a declaration of
s_stack_depth > 0, annotating ovl inode locks per stack depth
(ovl_lockdep_annotate_inode_mutex_key) and restricting the
types of filesystems that are allowed for writable upper layer.

In the lockdep dependency chain, overlayfs inode lock is taken
before kernfs internal of->mutex, where kernfs (sysfs) is the lower
layer of overlayfs, which is sane.

With /sys/power/resume (and probably other files), sysfs also
behaves as a stacking filesystem, calling vfs helpers, such as
lookup_bdev() -> kern_path(), which is a behavior of a stacked
filesystem, without all the precautions that comes with behaving
as a stacked filesystem.

If an overlayfs path is written into /sys/power/resume and that
overlayfs has sysfs as its lower layer, then there may be a small
chance of hitting the ABBA deadlock, but there could very well
be some conditions that prevent it.

It's a shame that converting blockdev path to major:minor is
not always done as a separate step by usersapce, but not much
to do about it now...

I don't think that Christoph's refactoring to blockdev interfaces
have changed anything in this regard, but I wasn't sure so CCed
the developers involved.

A relatively easy way to avert this use case is to define a lookup
flag LOOKUP_NO_STACKED, which does not traverse into any
stacked fs (s_stack_depth > 0) and use this flag for looking up
the hibernation blockdev.

I suppose writing the hibernation blockdev from inside containers
is not a common thing to do, so this mitigation could be good enough.

Thoughts?
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux