Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.04.24 12:07, David Hildenbrand wrote:
On 21.04.24 22:16, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
git tree:       linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8426b591c36b21c750e@xxxxxxxxxxxxxxxxxxxxxxxxx

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420

I think this is

if (pte_present(pte) && pte_uffd_wp(pte))
	WARN_ON_ONCE(pte_write(pte));

mm/page_table_check.c:213
Modules linked in:
CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
   <TASK>
   page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
   set_ptes include/linux/pgtable.h:267 [inline]
   __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
   ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
   change_pte_range mm/mprotect.c:194 [inline]
   change_pmd_range mm/mprotect.c:424 [inline]
   change_pud_range mm/mprotect.c:457 [inline]
   change_p4d_range mm/mprotect.c:480 [inline]
   change_protection_range mm/mprotect.c:508 [inline]
   change_protection+0x2770/0x3cc0 mm/mprotect.c:542
   mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
   do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
   __do_sys_mprotect mm/mprotect.c:841 [inline]
   __se_sys_mprotect mm/mprotect.c:838 [inline]
   __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f45514bf429
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
   </TASK>

Did we find a real issue that involves mprotect()?

At least can_change_pte_writable() should always return "false" for
userfaultfd_pte_wp().

Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?

Or was the PTE already writable and we only detect it now as we call
mprotect()? (missed to detect it earlier?)

Staring at the reproducer, we do


  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
          /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);

-> Writable anonymous memmory

  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
          /*offset=*/0ul);
  intptr_t res = 0;
  res = syscall(__NR_userfaultfd,
                /*flags=UFFD_USER_MODE_ONLY|O_NONBLOCK*/ 0x801ul);
  if (res != -1)
    r[0] = res;
  *(uint64_t*)0x200004c0 = 0xaa;
  *(uint64_t*)0x200004c8 = 0;
  *(uint64_t*)0x200004d0 = 0;
  syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x200004c0ul);

-> _UFFDIO_API handshake?

  syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x3000ul,
          /*prot=PROT_SEM|PROT_EXEC*/ 0xcul);

-> Protect target range R/O. I assume: no page populated yet?
-> 3 pages starting at 0x20ffc000ul;

  *(uint64_t*)0x20000180 = 0x20ffc000;
  *(uint64_t*)0x20000188 = 0x3000;
  *(uint64_t*)0x20000190 = 3;
  *(uint64_t*)0x20000198 = 0;
  syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);

-> _UFFDIO_REGISTER (aa00)
-> _range = 3 pages starting at 0x20ffc000ul
-> _mode = UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MINOR

  *(uint64_t*)0x20000000 = 0x20ffd000;
  *(uint64_t*)0x20000008 = 0x20ffb000;
  *(uint64_t*)0x20000010 = 0x1000;
  *(uint64_t*)0x20000018 = 3;
  *(uint64_t*)0x20000020 = 0;
  syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc028aa03, /*arg=*/0x20000000ul);

-> _UFFDIO_COPY (aa03)
-> dst = 0x20ffd000
-> src = 0x20ffb000
-> len = 0x1000 (single page)
-> mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP

-> We are copying into the R/O range. src should be R/W and trigger a page fault
   on access where we get a fresh page.

  *(uint16_t*)0x200000c0 = 1;
  *(uint64_t*)0x200000c8 = 0x20000040;
  *(uint16_t*)0x20000040 = 6;
  *(uint8_t*)0x20000042 = 0;
  *(uint8_t*)0x20000043 = 0;
  *(uint32_t*)0x20000044 = 0x7fffffff;
  res = syscall(__NR_seccomp, /*op=*/1ul, /*flags=*/0ul, /*arg=*/0x200000c0ul);
  if (res != -1)
    r[1] = res;
  syscall(__NR_open_tree, /*dfd=*/-1, /*filename=*/0ul, /*flags=*/0ul);

-> No idea what happens here and if it is relevant. If __NR_seccomp failed, we would
   no set r[1].

  syscall(__NR_close_range, /*fd=*/r[1], /*max_fd=*/-1, /*flags=*/0ul);

-> Is that closing uffd as well, especially if __NR_seccomp failed?

  syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x4000ul,
          /*prot=PROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful);

-> Restore write permissions. This seems to fire the uffd-wp page table check I assume.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux