Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/3/13 21:01, Jan Kara wrote:
On Mon 13-03-23 20:27:34, yebin wrote:
On 2023/3/13 19:57, Jan Kara wrote:
On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
On 3/7/23 11:02, Tudor Ambarus wrote:
On 3/7/23 10:39, Jan Kara wrote:
On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
On 2/13/23 15:56, syzbot wrote:
syzbot has found a reproducer for the following issue on:

HEAD commit:    ceaa837f96ad Linux 6.2-rc8
git tree:       upstream
console output:
https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
kernel config:
https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
dashboard link:
https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
for Debian) 2.35.2
syz repro:
https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000

Downloadable assets:
disk image:
https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
vmlinux:
https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
kernel image:
https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
mounted in repro:
https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the
commit:
Reported-by: syzbot+8785e41224a3afd04321@xxxxxxxxxxxxxxxxxxxxxxxxx

==================================================================
BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339

CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
6.2.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/21/2023
Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
     print_address_description mm/kasan/report.c:306 [inline]
     print_report+0x163/0x4f0 mm/kasan/report.c:417
     kasan_report+0x13a/0x170 mm/kasan/report.c:517
     crc16+0x1fb/0x280 lib/crc16.c:58
     ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
     ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
     ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
     ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
     ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
     ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
     ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
     ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
     ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
     ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
     evict+0x2a4/0x620 fs/inode.c:664
     do_unlinkat+0x4f1/0x930 fs/namei.c:4327
     __do_sys_unlink fs/namei.c:4368 [inline]
     __se_sys_unlink fs/namei.c:4366 [inline]
     __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fbc85a8c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
     </TASK>

The buggy address belongs to the physical page:
page:ffffea0001f78000 refcount:0 mapcount:-128
mapping:0000000000000000 index:0x0 pfn:0x7de00
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f
0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 1, migratetype Unmovable, gfp_mask
0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
     prep_new_page mm/page_alloc.c:2531 [inline]
     get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
     __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
     alloc_slab_page+0x6a/0x160 mm/slub.c:1851
     allocate_slab mm/slub.c:1998 [inline]
     new_slab+0x84/0x2f0 mm/slub.c:2051
     ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
     __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
     kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
     mt_alloc_bulk lib/maple_tree.c:157 [inline]
     mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
     mas_node_count_gfp lib/maple_tree.c:1316 [inline]
     mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
     vma_expand+0x277/0x850 mm/mmap.c:541
     mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
     do_mmap+0x8c9/0xf70 mm/mmap.c:1411
     vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
page last free stack trace:
     reset_page_owner include/linux/page_owner.h:24 [inline]
     free_pages_prepare mm/page_alloc.c:1446 [inline]
     free_pcp_prepare mm/page_alloc.c:1496 [inline]
     free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
     free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
     qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
     kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
     __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
     kasan_slab_alloc include/linux/kasan.h:201 [inline]
     slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
     slab_alloc_node mm/slub.c:3452 [inline]
     kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
     __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
     alloc_skb include/linux/skbuff.h:1270 [inline]
     alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
     sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
     unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg net/socket.c:734 [inline]
     __sys_sendto+0x475/0x5f0 net/socket.c:2117
     __do_sys_sendto net/socket.c:2129 [inline]
     __se_sys_sendto net/socket.c:2125 [inline]
     __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:
     ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                       ^
     ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================

I think the patch from below should fix it.

I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
super block in the buffer get corrupted sometime after the .get_tree
(which eventually calls __ext4_fill_super()) is called. So instead of
relying on the contents of the buffer, we should instead rely on the
s_desc_size initialized at the __ext4_fill_super() time.

If someone finds this good (or bad), or has a more in depth explanation,
please let me know, it will help me better understand the subsystem. In
the meantime I'll continue to investigate this and prepare a patch for
it.
If there's something corrupting the superblock while the filesystem is
mounted, we need to find what is corrupting the SB and fix *that*. Not
try
to paper over the problem by not using the on-disk data... Maybe journal
replay is corrupting the value or something like that?

                                  Honza

Ok, I agree. First thing would be to understand the reproducer and to
simplify it if possible. I haven't yet decoded what the syz repro is
doing at
https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
Will reply to this email thread once I understand what's happening. If
you or someone else can decode the syz repro faster than me, shoot.

I can now explain how the contents of the super block of the buffer get
corrupted. After the ext4 fs is mounted to the target ("./bus"), the
reproducer maps 6MB of data starting at offset 0 in the target's file
("./bus"), then it starts overriding the data with something else, by
using memcpy, memset, individual byte inits. Does that mean that we
shouldn't rely on the contents of the super block in the buffer after we
mount the file system? If so, then my patch stands. I'll be happy to
extend it if needed. Below one may find a step by step interpretation of
the reproducer.

We have a strace log for the same bug, but on Android 5.15:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000

Look for pid 328. You notice that the bpf() syscalls return error, so I
commented them out in the c repro to confirm that they are not the
cause. The bug reproduced without the bpf() calls. One can find the c
repro at:
https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000

Let's look at these calls, just before the bug was hit:
[pid   328] open("./bus",
O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
000) = 4
[pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
[pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
[pid   328] mmap(0x20000000, 6291456,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
5, 0) = 0x20000000
Yeah, looking at the reproducer, before this the reproducer also mounts
/dev/loop0 as ext4 filesystem.

- ./bus is created (if it does not exist), fd 4 is returned.
- /dev/loop0 is mounted to ./bus
- then it creates a new file descriptor (5) for the same ./bus
- then it creates a mapping for ./bus starting at offset zero. The
mapped area is at 0x20000000 and is of 0x600000ul length.
So the result is that the reproducer modified the block device while it is
mounted by the filesystem. We know cases like this can crash the kernel and
it is inherently difficult to fix. We have to trust the buffer cache
contents as otherwise the performance will be unacceptable. For historical
reasons we also have to allow modifications of buffer cache while ext4 is
mounted because tune2fs uses this to e.g. update the label of a mounted
filesystem.

Long-term we are moving ext4 in a direction where we can disallow block
device modifications while the fs is mounted but we are not there yet. I've
discussed some shorter-term solution to avoid such known problems with syzbot
developers and what seems plausible would be a kconfig option to disallow
writing to a block device when it is exclusively open by someone else.
But so far I didn't get to trying whether this would reasonably work. Would
you be interested in having a look into this?
I am interested in this job. The file system is often damaged by writing
block devices, which is a headache. I have always wanted to eradicate
this kind of problem.  A few months ago, I tried to add a mount parameter
to prohibit modification after the block device is mounted.But I
encountered several problems that led to the termination of my attempt.
First of all, the 32-bit super block flags have been used up and need to
be extended. Secondly, I don't know how to handle read-only flag in the
case of multiple mount points.
  "disallow writing to a block device when it is exclusively open by someone
else. "
-> Perhaps we can add a new IOCTL command to control whether write
operations are allowed after the block device has been exclusively
opened. I don't know if this is feasible?  Do you have any good
suggestions?
Well, ioctl() for syzbot would be possible as well but for start I'd try
whether the idea with kconfig option will work. Then it will be enough to
just make sure all kernels used for fuzzing are built with this option set.
Thanks for having a look into this!
In fact, I also want to solve the problem of file system damage caused by writing raw disks in the production environment. Use kconfig directly to control whether it loses flexibility in
the production environment.

								Honza




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux