Re: [syzbot] [xfs?] KASAN: null-ptr-deref Write in xfs_filestream_select_ag

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 24 Jan 2024 22:08:29 +1100

On Wed, Jan 24, 2024 at 11:01:20AM +0100, Jan Kara wrote:
> On Wed 24-01-24 00:50:10, syzbot wrote:
> > syzbot suspects this issue was fixed by commit:
> > 
> > commit 6f861765464f43a71462d52026fbddfc858239a5
> > Author: Jan Kara <jack@xxxxxxx>
> > Date:   Wed Nov 1 17:43:10 2023 +0000
> > 
> >     fs: Block writes to mounted block devices
> > 
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=119af36be80000
> > start commit:   17214b70a159 Merge tag 'fsverity-for-linus' of git://git.k..
> > git tree:       upstream
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d40f6d44826f6cf7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=87466712bb342796810a
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1492946ac80000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12e45ad6c80000
> 
> So this surprises me a bit because XFS isn't using block device buffer
> cache and thus syzbot has no way of corrupting cached metadata even before
> these changes. The reproducer tries to mount the loop device again after
> mounting the XFS image so I can imagine something bad happens but it isn't
> all that clear what. So I'll defer to XFS maintainers whether they want to
> mark this bug as fixed or investigate further.

I've consistently ignored this bug because it is doing stuff with
corrupted V4 XFS filesystems.

The key part of the strace is as follows. /dev/loop0 has been set up
to point to a memfd that maps the reproducer's internal memory. Then
we see:

....
[   50.859261][ T5070] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030.
[   50.869976][ T5070] XFS (loop0): Mounting V4 Filesystem 5e6273b8-2167-42bb-911b-418aa14a1261
[pid  5070] mount("/dev/loop0", "./file0", "xfs", 0, "filestreams,swidth=0x0000000000000000,nodiscard,logbufs=00000000000000000006,attr2,,nouuid") = 0
[pid  5070] openat(AT_FDCWD, "./file0", O_RDONLY|O_DIRECTORY) = 3
[pid  5070] chdir("./file0")            = 0
[pid  5070] ioctl(4, LOOP_CLR_FD)       = 0
[pid  5070] close(4)                    = 0
[pid  5070] open("./bus", O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME, 000) = 4
[pid  5070] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
[pid  5070] open("./bus", O_RDWR|O_NOCTTY|O_SYNC|O_NOATIME|0x3c) = 5
[pid  5070] openat(AT_FDCWD, "memory.current", O_RDWR|O_CREAT|O_NOCTTY|O_TRUNC|O_APPEND|FASYNC|0x18, 000) = 6

And then there's a corruption report and then the KASAN error.

AFAICT, what the reproducer is doing is setting internal memory as
backing device for /dev/loop0, then mounting it, then creating a
file in that XFS filesystem, then doing a bind mount of /dev/loop0
to that file, then opening that file again (which now points to
/dev/loop0) and overwriting it.

As XFS writes back the data to the file, it's actually overwriting
the loop device backing file. i.e. scribbling over the internal
memory of the syzkaller program. The filesystem then goes to read
metadata from the filesystem, and gets back metadata containing:

[   52.672334][ T4733] 00000000: 66 69 6c 65 73 74 72 65 61 6d 73 2c 73 77 69 64  filestreams,swid
[   52.681652][ T4733] 00000010: 74 68 3d 30 78 30 30 30 30 30 30 30 30 30 30 30  th=0x00000000000
[   52.690810][ T4733] 00000020: 30 30 30 30 30 2c 6e 6f 64 69 73 63 61 72 64 2c  00000,nodiscard,
[   52.700296][ T4733] 00000030: 6c 6f 67 62 75 66 73 3d 30 30 30 30 30 30 30 30  logbufs=00000000
[   52.709453][ T4733] 00000040: 30 30 30 30 30 30 30 30 30 30 30 36 2c 61 74 74  000000000006,att
[   52.718572][ T4733] 00000050: 72 32 2c 00 47 ba 76 39 f2 50 ff 99 2f fb b8 b1  r2,.G.v9.P../...
[   52.728140][ T4733] 00000060: 4c 3a 9b c2 e1 81 d0 9c 24 97 6b 33 7f 55 f4 90  L:......$.k3.U..
[   52.737174][ T4733] 00000070: 15 4c b3 65 d3 52 86 f0 51 c3 11 75 df a1 cc f1  .L.e.R..Q..u....

The mount options used to mount the filesystem in the first place.

I'd suggest that ithe commit the bisect landed on is blocking the
second open of "./bus" after it was bind mounted to /dev/loop0 and
so the write that corrupts the filesystem image never occurs, and so
the XFS filesystem never trips over that error and hence never
triggers the KASAN warning.

So, yeah, I can see why syzbot might think that commit fixed the
problem. But it didn't - it just broke the reproducer so the
corruption that triggered the problem never manifested...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx