Re: [syzbot] [mm?] WARNING in vma_merge_existing_range

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Thu, 2 Jan 2025 10:25:49 +0000

Happy new year!

On Tue, Dec 31, 2024 at 08:50:23PM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    8379578b11d5 Merge tag 'for-v6.13-rc' of git://git.kernel...
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16113018580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d269ef41b9262400
> dashboard link: https://syzkaller.appspot.com/bug?extid=46423ed8fa1f1148c6e4
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: i386

Hmmmm 32-bit? But kernel reports give 64-bit registers? So I guess 32-bit
userland, 64-bit kernel?

>
> Unfortunately, I don't have any reproducer for this issue yet.

Hmm. Racey thing?

>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/86d2e3352aff/disk-8379578b.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/345570cd3573/vmlinux-8379578b.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/01da37a51505/bzImage-8379578b.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+46423ed8fa1f1148c6e4@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 20504 at mm/vma.c:734 vma_merge_existing_range+0x1145/0x16f0 mm/vma.c:734

It'd be nice if syzbot could actually print the code that generates the
warning :) a nice-to-have perhaps.

This is:

	VM_WARN_ON(start >= end);

I suspect start == end, because start > end would be some drastic and
god-awful bug.

> Modules linked in:
> CPU: 1 UID: 0 PID: 20504 Comm: syz.6.5485 Not tainted 6.13.0-rc4-syzkaller-00069-g8379578b11d5 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> RIP: 0010:vma_merge_existing_range+0x1145/0x16f0 mm/vma.c:734
> Code: e8 20 24 0f 00 4d 2b 7d 00 4d 89 ec 48 8b 7c 24 38 e9 7f 01 00 00 e8 3a bc a8 ff 90 0f 0b 90 e9 a8 f1 ff ff e8 2c bc a8 ff 90 <0f> 0b 90 e9 0e f2 ff ff e8 1e bc a8 ff 90 0f 0b 90 4d 85 ed 0f 85

Be useful to get the kernel disassembly too :)

Best guess wranging a python script and objdump:

   0:	e8 20 24 0f 00       	call   0xf2425
   5:	4d 2b 7d 00          	sub    0x0(%r13),%r15
   9:	4d 89 ec             	mov    %r13,%r12
   c:	48 8b 7c 24 38       	mov    0x38(%rsp),%rdi
  11:	e9 7f 01 00 00       	jmp    0x195
  16:	e8 3a bc a8 ff       	call   0xffffffffffa8bc55
  1b:	90                   	nop
  1c:	0f 0b                	ud2
  1e:	90                   	nop
  1f:	e9 a8 f1 ff ff       	jmp    0xfffffffffffff1cc
  24:	e8 2c bc a8 ff       	call   0xffffffffffa8bc55
  29:	90                   	nop
  2a:	<0f> 0b                	ud2   <-- presumably here? This is an undefined instruction...
  2c:	90                   	nop
  2d:	e9 0e f2 ff ff       	jmp    0xfffffffffffff240
  32:	e8 1e bc a8 ff       	call   0xffffffffffa8bc55
  37:	90                   	nop
  38:	0f 0b                	ud2
  3a:	90                   	nop
  3b:	4d 85 ed             	test   %r13,%r13
  3e:	0f                   	.byte 0xf
  3f:	85                   	.byte 0x85

Yeah this might be a mix of data and code somehow or just garbage? Not sure
there's anything discernable there unfortunately.

> RSP: 0018:ffffc9000ba274a0 EFLAGS: 00010293
> RAX: ffffffff81f6b804 RBX: 0000000020c25000 RCX: ffff888060ad1e00
> RDX: 0000000000000000 RSI: 0000000020c25000 RDI: 0000000020c25000
> RBP: ffffc9000ba275f8 R08: ffffffff81f6aa0d R09: 00000000280000fa
> R10: ffffc9000ba27810 R11: fffff52001744f07 R12: 0000000020c25000
> R13: ffff888069b666c8 R14: ffffc9000ba276a0 R15: ffff888068d0b1f0
> FS:  0000000000000000(0000) GS:ffff8880b8700000(0063) knlGS:00000000f5116b40
> CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> CR2: 00007fa9de2c0018 CR3: 000000006b562000 CR4: 00000000003526f0

> Call Trace:
>  <TASK>
>  vma_modify+0x41/0x330 mm/vma.c:1514

Just passes through start, end (in vmg).

>  vma_modify_flags_name+0x3a6/0x430 mm/vma.c:1563

Just passes through start, end.

>  madvise_update_vma+0x2fe/0xc10 mm/madvise.c:159

Just passes through start, end.

This means it was one of MADV_NORMAL, MADV_RANDOM, MADV_DONTFORK,
MADV_DOFORK, MADV_WIPEONFORK, MADV_KEEPONFORK, MADV_DONTDUMP, MADV_DODUMP,
MADV_MERGEABLE, MADV_UNMERGEABLE, MADV_HUGEPAGE, MADV_NOHUGEPAGE.

Yeah we need better error handling here, because this report is just giving
us very little to go on especially for a non-repro. Will add to TODO.

>  madvise_vma_behavior mm/madvise.c:1325 [inline]

Just passes through start, end.

>  madvise_walk_vmas mm/madvise.c:1497 [inline]

OK here we find VMAs and walk them.

We explicitly check for start >= send if start < vma->vm_start.

I wonder if the visit() call is splitting the VMA which confuses the logic
here.

      s  e
      |  |
      v  v
|-------------|
|             |
|-------------|

Split:

      s  e
      |  |
      v  v
|--------|----|
|        |    |
|--------|----|

prev = this VMA.

	if (prev && start < prev->vm_end)
		start = prev->vm_end;

So we end up with:

         s,e
         |
         v
|--------|----|
|        |    |
|--------|----|

	tmp = vma->vm_end;
	if (end < tmp)
		tmp = end;

That tmp assignment will reinstate the broken end

And... boom.

Let me check this out and see if I can trigger it.

I may be missing some safeguard that prevents this...

>  do_madvise+0x1e64/0x4d10 mm/madvise.c:1684

Here we explicitly check for start >= end:

	end = start + len;
	if (end < start)
		return -EINVAL;

	if (end == start)
		return 0;

So overflow is accounted for also. But since this is a 64-bit kernel not
really a concern.

>  __do_sys_madvise mm/madvise.c:1700 [inline]
>  __se_sys_madvise mm/madvise.c:1698 [inline]
>  __ia32_sys_madvise+0xa6/0xc0 mm/madvise.c:1698
>  do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
>  __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
>  do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
>  entry_SYSENTER_compat_after_hwframe+0x84/0x8e
> RIP: 0023:0xf7fc2579
> Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 002b:00000000f511655c EFLAGS: 00000206 ORIG_RAX: 00000000000000db
> RAX: ffffffffffffffda RBX: 0000000020c00000 RCX: 0000000000400000
> RDX: 000000000000000e RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> ----------------
> Code disassembly (best guess), 2 bytes skipped:
>    0:	10 06                	adc    %al,(%rsi)
>    2:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
>    6:	10 07                	adc    %al,(%rdi)
>    8:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
>    c:	10 08                	adc    %cl,(%rax)
>    e:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
>   1e:	00 51 52             	add    %dl,0x52(%rcx)
>   21:	55                   	push   %rbp
>   22:	89 e5                	mov    %esp,%ebp
>   24:	0f 34                	sysenter
>   26:	cd 80                	int    $0x80
> * 28:	5d                   	pop    %rbp <-- trapping instruction
>   29:	5a                   	pop    %rdx
>   2a:	59                   	pop    %rcx
>   2b:	c3                   	ret
>   2c:	90                   	nop
>   2d:	90                   	nop
>   2e:	90                   	nop
>   2f:	90                   	nop
>   30:	90                   	nop
>   31:	90                   	nop
>   32:	90                   	nop
>   33:	90                   	nop
>   34:	90                   	nop
>   35:	90                   	nop
>   36:	90                   	nop
>   37:	90                   	nop
>   38:	90                   	nop
>   39:	90                   	nop
>   3a:	90                   	nop
>   3b:	90                   	nop
>   3c:	90                   	nop
>   3d:	90                   	nop
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup