> From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Tuesday, June 6, 2023 11:36 PM > To: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx> > Cc: linux-pci@xxxxxxxxxxxxxxx <linux-pci@xxxxxxxxxxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH] PCI: Release coalesced resource > > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > > On Thu, May 25, 2023 at 04:32:48PM +0100, Ross Lagerwall wrote: > > When contiguous windows are coalesced, the resource is invalidated and > > consequently not added to the bus. However, it remains in the resource > > hierarchy: > > > > ... > > ef2fff00-ef2fffff : 0000:00:13.2 > > ef2fff00-ef2fffff : ehci_hcd > > 00000000-00000000 : PCI Bus 0000:00 > > f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f] > > f0000000-f3ffffff : Reserved > > ... > > I assume the "00000000-00000000 : PCI Bus 0000:00" is the problematic > part? Is there anything in dmesg that shows the resources before they > were coalesced? Yes, that is the problematic part which gets removed by this patch. dmesg doesn't show the resources before they were coalesced, but I captured the output of /proc/iomem with/without the coalesce patch to see what was being coalesced. Without coalescing, this region ... fec00000-fec7ffff : PCI Bus 0000:00 fec00000-fec003ff : IOAPIC 0 fec80000-fecbffff : PCI Bus 0000:00 fec80000-fec803ff : IOAPIC 1 fec90000-fec93fff : pnp 00:06 ... gets coalesced into: fec00000-fecbffff : PCI Bus 0000:00 fec00000-fec003ff : IOAPIC 0 fec80000-fec803ff : IOAPIC 1 fec90000-fec93fff : pnp 00:06 > > Is there an error message we could include here to link the problem > with the solution? The error shows two "clipped" messages followed by a BUG when starting a VM under Xen. Having said that, I don't think the error is specific to Xen - it just doesn't handle getting back an unexpected resource range. [ 2783.654292] clipped [mem 0x100000000-0x3fffffffffff] to [mem 0x230a07000-0x3fffffffffff] for e820 entry [mem 0x100000000-0x230a06fff] [ 2783.654311] clipped [mem 0x230a07000-0x3fffffffffff] to [mem 0x10000000000-0x3fffffffffff] for e820 entry [mem 0xfd00000000-0xffffffffff] [ 2783.710864] memmap_init_zone_device initialised 32768 pages in 0ms [ 2783.711124] ------------[ cut here ]------------ [ 2783.711127] kernel BUG at arch/x86/xen/p2m.c:542! [ 2783.711166] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 2783.711177] CPU: 1 PID: 1795 Comm: xenconsoled Not tainted 6.1.27+0 #1 [ 2783.711189] Hardware name: Dell Inc. PowerEdge R815/0272WF, BIOS 2.8.2 05/21/2012 [ 2783.711200] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930 [ 2783.711222] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff [ 2783.711286] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246 [ 2783.711297] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff [ 2783.711308] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067 [ 2783.711318] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000 [ 2783.711328] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8 [ 2783.711337] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000 [ 2783.711358] FS: 00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000 [ 2783.711370] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2783.711379] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660 [ 2783.711391] Call Trace: [ 2783.711401] <TASK> [ 2783.711412] xen_alloc_unpopulated_pages+0xa6/0x430 [ 2783.711429] gnttab_alloc_pages+0x11/0x50 [ 2783.711441] gntdev_alloc_map+0x1d2/0x2e0 [ 2783.711455] gntdev_ioctl+0x261/0x540 [ 2783.711466] __x64_sys_ioctl+0x8a/0xc0 [ 2783.711480] do_syscall_64+0x3b/0x90 [ 2783.711494] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 2783.711506] RIP: 0033:0x7f2956a875d7 [ 2783.711516] Code: 44 00 00 48 8b 05 b9 08 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 08 2d 00 f7 d8 64 89 01 48 [ 2783.711536] RSP: 002b:00007ffcd8292e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 2783.711549] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f2956a875d7 [ 2783.711559] RDX: 00007ffcd8292e90 RSI: 0000000000184700 RDI: 000000000000000c [ 2783.711569] RBP: 00007ffcd8292f30 R08: 00007ffcd8292f5c R09: 00007ffcd8292f58 [ 2783.711579] R10: 00007ffcd82928e0 R11: 0000000000000206 R12: 00007ffcd8292e90 [ 2783.711589] R13: 0000000000000003 R14: 000000000000000c R15: 0000000000000001 [ 2783.711602] </TASK> [ 2783.711608] Modules linked in: arptable_filter arp_tables tcp_diag udp_diag raw_diag inet_diag netlink_diag ebtable_filter ebtables nfsv3 nfs_acl nfs lockd grace fscache netfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_conncount nf_nat 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter dm_multipath sunrpc dm_mod crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd ipmi_si psmouse ipmi_devintf k10temp i2c_piix4 sg fam15h_power ipmi_msghandler acpi_power_meter xen_wdt ip_tables x_tables hid_generic usbhid hid sd_mod t10_pi sr_mod cdrom crc64_rocksoft crc64 ohci_pci ahci libahci ehci_pci serio_raw ixgbe ehci_hcd mdio libata ohci_hcd xfrm_algo megaraid_sas bnx2 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod scsi_common ipv6 crc_ccitt [ 2783.711805] ---[ end trace 0000000000000000 ]--- [ 2783.716538] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930 [ 2783.716553] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff [ 2783.716573] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246 [ 2783.716585] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff [ 2783.716596] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067 [ 2783.716608] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000 [ 2783.716620] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8 [ 2783.716631] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000 [ 2783.716648] FS: 00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000 [ 2783.716661] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2783.716671] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660 This was separately reported here: https://github.com/QubesOS/qubes-issues/issues/7918#issuecomment-1331763950 I have the dmesg and /proc/iomem logs here (somewhat older kernel): https://pastebin.com/raw/8TQUp2uG > > > In some cases (e.g. the Xen scratch region), this causes future calls to > > allocate_resource() to choose an inappropriate location which the caller > > cannot handle. Fix by releasing the resource and removing from the > > hierarchy. > > > > Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures") > > 7c3855c423b1 appeared in v5.16, so we may need a stable tag? Yes, I think so. Thanks, Ross