On Tue, Oct 25, 2022 at 11:13:37PM +0800, Chao Peng wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > <snip> > +static struct file *restrictedmem_file_create(struct file *memfd) > +{ > + struct restrictedmem_data *data; > + struct address_space *mapping; > + struct inode *inode; > + struct file *file; > + > + data = kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) > + return ERR_PTR(-ENOMEM); > + > + data->memfd = memfd; > + mutex_init(&data->lock); > + INIT_LIST_HEAD(&data->notifiers); > + > + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb); > + if (IS_ERR(inode)) { > + kfree(data); > + return ERR_CAST(inode); > + } > + > + inode->i_mode |= S_IFREG; > + inode->i_op = &restrictedmem_iops; > + inode->i_mapping->private_data = data; > + > + file = alloc_file_pseudo(inode, restrictedmem_mnt, > + "restrictedmem", O_RDWR, > + &restrictedmem_fops); > + if (IS_ERR(file)) { > + iput(inode); > + kfree(data); > + return ERR_CAST(file); > + } > + > + file->f_flags |= O_LARGEFILE; > + > + mapping = memfd->f_mapping; > + mapping_set_unevictable(mapping); > + mapping_set_gfp_mask(mapping, > + mapping_gfp_mask(mapping) & ~__GFP_MOVABLE); Is this supposed to prevent migration of pages being used for restrictedmem/shmem backend? In my case I've been testing SNP support based on UPM v9, and for large guests (128GB+), if I force 2M THPs via: echo always >/sys/kernel/mm/transparent_hugepages/shmem_enabled it will in some cases trigger the below trace, which suggests that kcompactd is trying to call migrate_folio() on a PFN that was/is still allocated for guest private memory (and so has been removed from directmap as part of shared->private conversation via REG_REGION kvm ioctl, leading to the crash). This trace seems to occur during early OVMF boot while the guest is in the middle of pre-accepting on private memory (no lazy accept in this case). Is this expected behavior? What else needs to be done to ensure migrations aren't attempted in this case? Thanks! -Mike # Host logs with debug info for crash during SNP boot ... [ 904.373632] kvm_restricted_mem_get_pfn: GFN: 0x1caced1, PFN: 0x156b7f, page: ffffea0006b197b0, ref_count: 2 [ 904.373634] kvm_restricted_mem_get_pfn: GFN: 0x1caced2, PFN: 0x156840, page: ffffea0006b09400, ref_count: 2 [ 904.373637] kvm_restricted_mem_get_pfn: GFN: 0x1caced3, PFN: 0x156841, page: ffffea0006b09450, ref_count: 2 [ 904.373639] kvm_restricted_mem_get_pfn: GFN: 0x1caced4, PFN: 0x156842, page: ffffea0006b094a0, ref_count: 2 [ 904.373641] kvm_restricted_mem_get_pfn: GFN: 0x1caced5, PFN: 0x156843, page: ffffea0006b094f0, ref_count: 2 [ 904.373645] kvm_restricted_mem_get_pfn: GFN: 0x1caced6, PFN: 0x156844, page: ffffea0006b09540, ref_count: 2 [ 904.373647] kvm_restricted_mem_get_pfn: GFN: 0x1caced7, PFN: 0x156845, page: ffffea0006b09590, ref_count: 2 [ 904.373649] kvm_restricted_mem_get_pfn: GFN: 0x1caced8, PFN: 0x156846, page: ffffea0006b095e0, ref_count: 2 [ 904.373652] kvm_restricted_mem_get_pfn: GFN: 0x1caced9, PFN: 0x156847, page: ffffea0006b09630, ref_count: 2 [ 904.373654] kvm_restricted_mem_get_pfn: GFN: 0x1caceda, PFN: 0x156848, page: ffffea0006b09680, ref_count: 2 [ 904.373656] kvm_restricted_mem_get_pfn: GFN: 0x1cacedb, PFN: 0x156849, page: ffffea0006b096d0, ref_count: 2 [ 904.373661] kvm_restricted_mem_get_pfn: GFN: 0x1cacedc, PFN: 0x15684a, page: ffffea0006b09720, ref_count: 2 [ 904.373663] kvm_restricted_mem_get_pfn: GFN: 0x1cacedd, PFN: 0x15684b, page: ffffea0006b09770, ref_count: 2 # PFN 0x15684c is allocated for guest private memory, will have been removed from directmap as part of RMP requirements [ 904.373665] kvm_restricted_mem_get_pfn: GFN: 0x1cacede, PFN: 0x15684c, page: ffffea0006b097c0, ref_count: 2 ... # kcompactd crashes trying to copy PFN 0x15684c to a new folio, crashes trying to access PFN via directmap [ 904.470135] Migrating restricted page, SRC pfn: 0x15684c, folio_ref_count: 2, folio_order: 0 [ 904.470154] BUG: unable to handle page fault for address: ffff88815684c000 [ 904.470314] kvm_restricted_mem_get_pfn: GFN: 0x1cafe00, PFN: 0x19f6d0, page: ffffea00081d2100, ref_count: 2 [ 904.477828] #PF: supervisor read access in kernel mode [ 904.477831] #PF: error_code(0x0000) - not-present page [ 904.477833] PGD 6601067 P4D 6601067 PUD 1569ad063 PMD 1569af063 PTE 800ffffea97b3060 [ 904.508806] Oops: 0000 [#1] SMP NOPTI [ 904.512892] CPU: 52 PID: 1563 Comm: kcompactd0 Tainted: G E 6.0.0-rc7-hsnp-v7pfdv9d+ #10 [ 904.523473] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021 [ 904.532499] RIP: 0010:copy_page+0x7/0x10 [ 904.536877] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 [ 904.557831] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286 [ 904.563661] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200 [ 904.571622] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000 [ 904.579581] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0 [ 904.587541] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60 [ 904.595502] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8 [ 904.603462] FS: 0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000 [ 904.612489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 904.618897] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0 [ 904.626855] PKRU: 55555554 [ 904.629870] Call Trace: [ 904.632594] <TASK> [ 904.634928] ? folio_copy+0x8c/0xe0 [ 904.638818] migrate_folio+0x5b/0x110 [ 904.642901] move_to_new_folio+0x5b/0x150 [ 904.647371] migrate_pages+0x11bb/0x1830 [ 904.651743] ? move_freelist_tail+0xc0/0xc0 [ 904.656406] ? isolate_freepages_block+0x470/0x470 [ 904.661749] compact_zone+0x681/0xda0 [ 904.665832] kcompactd_do_work+0x1b3/0x2c0 [ 904.670400] kcompactd+0x257/0x330 [ 904.674190] ? prepare_to_wait_event+0x120/0x120 [ 904.679338] ? kcompactd_do_work+0x2c0/0x2c0 [ 904.684098] kthread+0xcf/0xf0 [ 904.687501] ? kthread_complete_and_exit+0x20/0x20 [ 904.692844] ret_from_fork+0x22/0x30 [ 904.696830] </TASK> [ 904.699262] Modules linked in: nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) ip6table_mangle(E) ip6table_nat(E) iptable_mangle(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) nfnetlink(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) bridge(E) stp(E) llc(E) kvm_amd(E) overlay(E) nls_iso8859_1(E) kvm(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) ipmi_si(E) ipmi_devintf(E) wmi_bmof(E) ipmi_msghandler(E) efi_pstore(E) binfmt_misc(E) ast(E) drm_vram_helper(E) joydev(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) input_leds(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ccp(E) k10temp(E) mac_hid(E) sch_fq_codel(E) parport_pc(E) ppdev(E) lp(E) parport(E) drm(E) ip_tables(E) [ 904.699316] x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) zstd_compress(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) multipath(E) linear(E) crc32_pclmul(E) hid_generic(E) usbhid(E) hid(E) e1000e(E) i2c_piix4(E) wmi(E) [ 904.828498] CR2: ffff88815684c000 [ 904.832193] ---[ end trace 0000000000000000 ]--- [ 904.937159] RIP: 0010:copy_page+0x7/0x10 [ 904.941524] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 [ 904.962478] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286 [ 904.968305] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200 [ 904.976265] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000 [ 904.984227] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0 [ 904.992187] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60 [ 905.000145] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8 [ 905.008105] FS: 0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000 [ 905.017132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 905.023540] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0 [ 905.031501] PKRU: 55555554 [ 905.034558] kvm_restricted_mem_get_pfn: GFN: 0x1cafe01, PFN: 0x19f6d1, page: ffffea00081d2150, ref_count: 2 [ 905.045455] kvm_restricted_mem_get_pfn: GFN: 0x1cafe02, PFN: 0x19f6d2, page: ffffea00081d21a0, ref_count: 2 ...