On 05/01/2024 02:20, Yin Fengwei wrote: > > > On 2024/1/5 05:36, David Hildenbrand wrote: >> On 03.01.24 15:16, Yin, Fengwei wrote: >>> >>> >>> On 1/3/2024 8:13 PM, David Hildenbrand wrote: >>>> On 03.01.24 12:48, syzbot wrote: >>>>> Hello, >>>>> >>>>> syzbot found the following issue on: >>>>> >>>>> HEAD commit: ab0b3e6ef50d Add linux-next specific files for 20240102 >>>>> git tree: linux-next >>>>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=17be3e09e80000 >>>>> kernel config: >>>>> https://syzkaller.appspot.com/x/.config?x=a14a6350374945f9 >>>>> dashboard link: >>>>> https://syzkaller.appspot.com/bug?extid=50ef73537bbc393a25bb >>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils >>>>> for Debian) 2.40 >>>>> syz repro: >>>>> https://syzkaller.appspot.com/x/repro.syz?x=14e2256ee80000 >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17b57db5e80000 >>>>> >>>>> Downloadable assets: >>>>> disk image: >>>>> https://storage.googleapis.com/syzbot-assets/4e6376fe5764/disk-ab0b3e6e.raw.xz >>>>> vmlinux: >>>>> https://storage.googleapis.com/syzbot-assets/7cb9ecbaf001/vmlinux-ab0b3e6e.xz >>>>> kernel image: >>>>> https://storage.googleapis.com/syzbot-assets/2c1a9a6d424f/bzImage-ab0b3e6e.xz >>>>> >>>>> The issue was bisected to: >>>>> >>>>> commit 68f0320824fa59c5429cbc811e6c46e7a30ea32c >>>>> Author: David Hildenbrand <david@xxxxxxxxxx> >>>>> Date: Wed Dec 20 22:44:31 2023 +0000 >>>>> >>>>> mm/rmap: convert folio_add_file_rmap_range() into >>>>> folio_add_file_rmap_[pte|ptes|pmd]() >>>>> >>>>> bisection log: >>>>> https://syzkaller.appspot.com/x/bisect.txt?x=10b9e1b1e80000 >>>>> final oops: >>>>> https://syzkaller.appspot.com/x/report.txt?x=12b9e1b1e80000 >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=14b9e1b1e80000 >>>>> >>>>> IMPORTANT: if you fix the issue, please add the following tag to the >>>>> commit: >>>>> Reported-by: syzbot+50ef73537bbc393a25bb@xxxxxxxxxxxxxxxxxxxxxxxxx >>>>> Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() >>>>> into folio_add_file_rmap_[pte|ptes|pmd]()") >>>>> >>>>> kasan_quarantine_reduce+0x18e/0x1d0 mm/kasan/quarantine.c:283 >>>>> __kasan_slab_alloc+0x65/0x90 mm/kasan/common.c:324 >>>>> kasan_slab_alloc include/linux/kasan.h:201 [inline] >>>>> slab_post_alloc_hook mm/slub.c:3813 [inline] >>>>> slab_alloc_node mm/slub.c:3860 [inline] >>>>> kmem_cache_alloc+0x136/0x320 mm/slub.c:3867 >>>>> vm_area_alloc+0x1f/0x220 kernel/fork.c:465 >>>>> mmap_region+0x3ae/0x2a90 mm/mmap.c:2804 >>>>> do_mmap+0x890/0xef0 mm/mmap.c:1379 >>>>> vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573 >>>>> ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425 >>>>> __do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline] >>>>> __se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline] >>>>> __x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86 >>>>> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >>>>> do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83 >>>>> entry_SYSCALL_64_after_hwframe+0x62/0x6a >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 1 PID: 5059 at include/linux/rmap.h:202 >>>>> __folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202 >>>>> Modules linked in: >>>>> CPU: 1 PID: 5059 Comm: syz-executor115 Not tainted >>>>> 6.7.0-rc8-next-20240102-syzkaller #0 >>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, >>>>> BIOS Google 11/17/2023 >>>>> RIP: 0010:__folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202 >>>>> Code: 41 83 e4 01 44 89 e6 e8 79 bc b7 ff 45 84 e4 0f 85 08 fc ff ff >>>>> e8 3b c1 b7 ff 48 c7 c6 e0 b5 d9 8a 48 89 df e8 5c 12 f7 ff 90 <0f> 0b >>>>> 90 e9 eb fb ff ff e8 1e c1 b7 ff be 01 00 00 00 48 89 df e8 >>>>> RSP: 0018:ffffc900038df978 EFLAGS: 00010293 >>>>> RAX: 0000000000000000 RBX: ffffea00008cde00 RCX: ffffffff81687419 >>>>> RDX: ffff88807becbb80 RSI: ffffffff81d06104 RDI: 0000000000000000 >>>>> RBP: ffffea00008cde00 R08: 0000000000000000 R09: fffffbfff1e75f6a >>>>> R10: ffffffff8f3afb57 R11: 0000000000000001 R12: 0000000000000000 >>>>> R13: 0000000000000001 R14: 0000000000000000 R15: dffffc0000000000 >>>>> FS: 0000555556508380(0000) GS:ffff8880b9900000(0000) >>>>> knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> CR2: 00000000200000c0 CR3: 0000000079000000 CR4: 00000000003506f0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Call Trace: >>>>> <TASK> >>>>> __folio_add_rmap mm/rmap.c:1167 [inline] >>>>> __folio_add_file_rmap mm/rmap.c:1452 [inline] >>>>> folio_add_file_rmap_ptes+0x8e/0x2c0 mm/rmap.c:1478 >>>>> insert_page_into_pte_locked.isra.0+0x34d/0x960 mm/memory.c:1874 >>>>> insert_page mm/memory.c:1900 [inline] >>>>> vm_insert_page+0x62c/0x8c0 mm/memory.c:2053 >>>>> packet_mmap+0x314/0x570 net/packet/af_packet.c:4594 >>>>> call_mmap include/linux/fs.h:2090 [inline] >>>>> mmap_region+0x745/0x2a90 mm/mmap.c:2819 >>>>> do_mmap+0x890/0xef0 mm/mmap.c:1379 >>>>> vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573 >>>>> ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425 >>>>> __do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline] >>>>> __se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline] >>>>> __x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86 >>>>> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >>>>> do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83 >>>>> entry_SYSCALL_64_after_hwframe+0x62/0x6a >>>> >>>> If I am not wrong, that triggers: >>>> >>>> VM_WARN_ON_FOLIO(folio_test_large(folio) && >>>> !folio_test_large_rmappable(folio), folio); >>>> >>>> So we are trying to rmap a large folio that did not go through >>>> folio_prep_large_rmappable(). Would someone mind explaining the rules to me for this? As far as I can see, folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we remember to deinit the list on destruction. Why can't we just init that list for all folios order-2 or greater? Then everything is rmappable? >>>> >>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed >>>> in the "struct packet_ring_buffer". No idea where that comes from, but I >>>> suspect it's simply some compound allocation. >>> Looks like: >>> alloc_pg_vec >>> alloc_one_pg_vec_page >>> gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP | >>> __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; >>> >>> buffer = (char *) __get_free_pages(gfp_flags, order); >>> So you are right here... :). >> >> Hm, but I wonder if this something that's supposed to work or is this one of >> the cases where we should actually use a VM_PFN mapping? >> >> It's not a pagecache(file/shmem) page after all. >> >> We could relax that check and document why we expect something that is not >> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP >> instead (like recent udmabuf changes). > > VM_PFNMAP looks correct. And why is making the folio rmappable and mapping it the normal way not the right solution here? Because the folio could be order-1? Or something more profound? > > I do have another question: why do we just check the large folio > rmappable? Does that mean order0 folio is always rmappable? > > I ask this because vm_insert_pages() is called in net/ipv4/tcp.c > and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP. > > There is not warning because we didn't check order0 folio rmappable. > > > Regards > Yin, Fengwei >