kernel bug found in exfat and suggestions for fixing it

"ffhgfv" <744439878@xxxxxx> · Tue, 4 Mar 2025 07:26:41 -0500

Hello, I found a bug titled "KASAN: vmalloc-out-of-bounds Write in vfree_atomic " with modified syzkaller in the lasted upstream related to  exfat file system.
If you fix this issue, please add the following tag to the commit:&nbsp; Reported-by: Jianzhou Zhao<xnxc22xnxc22@xxxxxx>;,&nbsp;&nbsp;&nbsp; xingwei lee <xrivendell7@xxxxxxxxx>; Zhizhuo Tang <strforexctzzchange@xxxxxxxxxxx>

------------[ cut here ]------------
TITLE: BUG: KASAN: vmalloc-out-of-bounds in llist_add_batch
==================================================================
BUG: KASAN: vmalloc-out-of-bounds in llist_add_batch+0x14f/0x170 lib/llist.c:32
Write of size 8 at addr ffffc90006531000 by task syz.0.183/13735

CPU: 1 UID: 0 PID: 13735 Comm: syz.0.183 Not tainted 6.14.0-rc5-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <irq>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc1/0x630 mm/kasan/report.c:521
 kasan_report+0xbd/0xf0 mm/kasan/report.c:634
 llist_add_batch+0x14f/0x170 lib/llist.c:32
 llist_add include/linux/llist.h:248 [inline]
 vfree_atomic+0x5e/0xe0 mm/vmalloc.c:3326
 vfree+0x7c1/0x940 mm/vmalloc.c:3353
 kvfree+0x32/0x50 mm/util.c:703
 delayed_free+0x49/0xb0 fs/exfat/super.c:809
 rcu_do_batch kernel/rcu/tree.c:2546 [inline]
 rcu_core+0x79f/0x14f0 kernel/rcu/tree.c:2802
 handle_softirqs+0x1d1/0x870 kernel/softirq.c:561
 __do_softirq kernel/softirq.c:595 [inline]
 invoke_softirq kernel/softirq.c:435 [inline]
 __irq_exit_rcu+0x109/0x170 kernel/softirq.c:662
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
 sysvec_apic_timer_interrupt+0xa8/0xc0 arch/x86/kernel/apic/apic.c:1049
 </irq>
 <task>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:lock_acquire.part.0+0x155/0x370 kernel/locking/lockdep.c:5816
Code: b8 ff ff ff ff 65 0f c1 05 30 13 6d 7e 83 f8 01 0f 85 ca 01 00 00 9c 58 f6 c4 02 0f 85 df 01 00 00 48 85 ed 0f 85 b0 01 00 00 &lt;48&gt; b8 00 00 00 00 00 fc ff df 48 01 c3 48 c7 03 00 00 00 00 48 c7
RSP: 0018:ffffc9000631f478 EFLAGS: 00000206
RAX: 0000000000000046 RBX: 1ffff92000c63e90 RCX: 1ffff92000c63e77
RDX: 1ffff1100426a15d RSI: 0000000000000002 RDI: 0000000000000000
RBP: 0000000000000200 R08: 0000000000000000 R09: fffffbfff2d943a0
R10: ffffffff96ca1d07 R11: 0000000000000000 R12: 0000000000000002
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8dfbc0e0
 rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
 rcu_read_lock_sched include/linux/rcupdate.h:941 [inline]
 pfn_valid include/linux/mmzone.h:2067 [inline]
 pfn_valid include/linux/mmzone.h:2050 [inline]
 page_table_check_clear+0x112/0x9b0 mm/page_table_check.c:70
 __page_table_check_pte_clear+0xfc/0x110 mm/page_table_check.c:169
 page_table_check_pte_clear include/linux/page_table_check.h:49 [inline]
 ptep_get_and_clear_full arch/x86/include/asm/pgtable.h:1337 [inline]
 get_and_clear_full_ptes include/linux/pgtable.h:712 [inline]
 zap_present_folio_ptes mm/memory.c:1511 [inline]
 zap_present_ptes mm/memory.c:1596 [inline]
 do_zap_pte_range mm/memory.c:1698 [inline]
 zap_pte_range mm/memory.c:1742 [inline]
 zap_pmd_range mm/memory.c:1834 [inline]
 zap_pud_range mm/memory.c:1863 [inline]
 zap_p4d_range mm/memory.c:1884 [inline]
 unmap_page_range+0x2db5/0x4270 mm/memory.c:1905
 unmap_single_vma+0x19a/0x2b0 mm/memory.c:1951
 unmap_vmas+0x1f2/0x440 mm/memory.c:1995
 exit_mmap+0x1b4/0xbc0 mm/mmap.c:1284
 __mmput+0x128/0x400 kernel/fork.c:1356
 mmput+0x60/0x70 kernel/fork.c:1378
 exit_mm kernel/exit.c:570 [inline]
 do_exit+0x9ae/0x2d00 kernel/exit.c:925
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
 get_signal+0x2278/0x2540 kernel/signal.c:3036
 arch_do_signal_or_restart+0x81/0x7d0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xd8/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f366fbab49e
Code: Unable to access opcode bytes at 0x7f366fbab474.
RSP: 002b:00007f3670accda8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: fffffffffffffff4 RBX: 00000000000014d8 RCX: 00007f366fbab49e
RDX: 0000000020001500 RSI: 0000000020001540 RDI: 00007f3670acce00
RBP: 00007f3670acce40 R08: 00007f3670acce40 R09: 0000000000000000
R10: 0000000000010400 R11: 0000000000000246 R12: 0000000020001500
R13: 0000000020001540 R14: 00007f3670acce00 R15: 0000000020000040
 </task>

The buggy address ffffc90006531000 belongs to a vmalloc virtual mapping
Memory state around the buggy address:
 ffffc90006530f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 ffffc90006530f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
&gt;ffffc90006531000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                   ^
 ffffc90006531080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 ffffc90006531100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================
----------------
Code disassembly (best guess):
   0:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
   5:	65 0f c1 05 30 13 6d 	xadd   %eax,%gs:0x7e6d1330(%rip)        # 0x7e6d133d
   c:	7e
   d:	83 f8 01             	cmp    $0x1,%eax
  10:	0f 85 ca 01 00 00    	jne    0x1e0
  16:	9c                   	pushf
  17:	58                   	pop    %rax
  18:	f6 c4 02             	test   $0x2,%ah
  1b:	0f 85 df 01 00 00    	jne    0x200
  21:	48 85 ed             	test   %rbp,%rbp
  24:	0f 85 b0 01 00 00    	jne    0x1da
* 2a:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax &lt;-- trapping instruction
  31:	fc ff df
  34:	48 01 c3             	add    %rax,%rbx
  37:	48 c7 03 00 00 00 00 	movq   $0x0,(%rbx)
  3e:	48                   	rex.W
  3f:	c7                   	.byte 0xc7

==================================================================
I use the same kernel as syzbot instance upstream: 7eb172143d5508b4da468ed59ee857c6e5e01da6
kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&amp;amp;x=da4b04ae798b7ef6
compiler: gcc version 11.4.0
===============================================================================
Unfortunately, the modified syzkaller does not generate an effective repeat program.
The following is my analysis of the bug and repair suggestions, hoping to help with the repair of the bug:
Root cause analysis
Trigger path:
The exfat file system frees memory by calling kvfree with the delayed_free function.
kvfree calls vfree and finally adds the memory block to the unchained table (llist_add) via the atomic operation vfree_atomic.
In the linked list operation of llist_add_batch, an out-of-bounds was triggered when the next pointer was written to the address ffffc90006531000.
The root of the problem:
Use After memory release (use-after-free) :
You may try to add memory to a linked list even after it has been freed. For example, struct exfat_sb_info is released in the RCU callback (delayed_free), but the linked list operation still references the members of the struct.
Wrong list node address:
The node pointer passed to llist_add may not point to a valid vmalloc memory region, or the pointer is miscalculated.

### Repair suggestions
1. Ensure that the linked list operation is performed while the memory is active
Problem: llist_add may operate on its node after memory is freed.
Fix: Before freeing memory, make sure it is no longer referenced by the linked list.
Modify delayed_free or other related functions to ensure that the list node is removed before the RCU callback.
// exfat/super.c
static void delayed_free(struct rcu_head *p) {
    struct exfat_sb_info *sbi = container_of(p, struct exfat_sb_info, rcu);
+   // Remove a node from the linked list before freeing (assuming similar operations exist)
+   llist_del(&amp;sbi-&gt;list_node);
    kvfree(sbi);
}
2. Check the validity of the node pointer in the linked list
Problem: The llist_add node pointer may point to an invalid address.
Fix: Verify that the pointer is in the vmalloc region before calling llist_add.

void vfree_atomic(const void *addr)
{
	struct vfree_deferred *p = raw_cpu_ptr(&amp;vfree_deferred);

	BUG_ON(in_nmi());
	kmemleak_free(addr);

	/*
	 * Use raw_cpu_ptr() because this can be called from preemptible
	 * context. Preemption is absolutely fine here, because the llist_add()
	 * implementation is lockless, so it works even if we are adding to
	 * another cpu's list. schedule_work() should be fine with this too.
	 */
+   // Check whether addr is in the vmalloc area
+   if (!is_vmalloc_addr(addr)) {
+       WARN_ONCE(1, "vfree_atomic: invalid address %p\n", addr);
+       return;
+   }
	if (addr &amp;&amp; llist_add((struct llist_node *)addr, &amp;p-&gt;list))
		schedule_work(&amp;p-&gt;wq);
}
=========================================================================
I hope it helps.
Best regards
Jianzhou Zhao
xingwei lee
Zhizhuo Tang</strforexctzzchange@xxxxxxxxxxx></xrivendell7@xxxxxxxxx></xnxc22xnxc22@xxxxxx>