On Mon, Jan 06, 2025 at 10:38:45AM -0500, Yazen Ghannam wrote: > On Fri, Jan 03, 2025 at 10:49:25PM +0100, Borislav Petkov wrote: > > On Fri, Dec 06, 2024 at 04:11:53PM +0000, Yazen Ghannam wrote: > > > Hi all, > > > > > > The theme of this set is decoupling the "AMD node" concept from the > > > legacy northbridge support. > > > > > > Additionally, AMD System Management Network (SMN) access code is > > > decoupled and expanded too. > > > > > > Patches 1-3 begin reducing the scope of AMD_NB. > > > > > > Patches 4-9 begin moving generic AMD node support out of AMD_NB. > > > > > > Patches 10-13 move SMN support out of AMD_NB and do some refactoring. > > > > > > Patch 14 has HSMP reuse SMN functionality. > > > > > > Patches 15-16 address userspace access to SMN. > > > > So I took the first patch and then booting the first 13 with the intention to > > queue them while the remaining three are still being discussed, is causing the > > below in my guest. > > > > .config is attached, I've pushed the branch here too, if you wanna test with > > it: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-misc > > > > [ 0.897060] cirrus 0000:00:01.0: [drm] fb0: cirrusdrmfb frame buffer device > > [ 0.900310] BUG: kernel NULL pointer dereference, address: 00000000000000c4 > > [ 0.902551] #PF: supervisor read access in kernel mode > > [ 0.904096] #PF: error_code(0x0000) - not-present page > > [ 0.904268] PGD 0 P4D 0 > > [ 0.904268] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > > [ 0.904268] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.13.0-rc1+ #1 > > [ 0.904268] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-8 02/21/2024 > > [ 0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40 > > [ 0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89 > > [ 0.904268] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246 > > [ 0.904268] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f > > [ 0.904268] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000 > > [ 0.904268] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010 > > [ 0.904268] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163 > > [ 0.904268] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90 > > [ 0.904268] FS: 0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000 > > [ 0.904268] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.904268] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0 > > [ 0.904268] Call Trace: > > [ 0.904268] <TASK> > > [ 0.904268] ? __die+0x31/0x80 > > [ 0.904268] ? page_fault_oops+0x15d/0x4f0 > > [ 0.904268] ? srso_return_thunk+0x5/0x5f > > [ 0.904268] ? ttwu_queue_wakelist+0xf7/0x100 > > [ 0.904268] ? exc_page_fault+0x78/0x150 > > [ 0.904268] ? asm_exc_page_fault+0x26/0x30 > > [ 0.904268] ? pci_read_config_dword+0x9/0x40 > > [ 0.904268] ? srso_return_thunk+0x5/0x5f > > [ 0.904268] amd_init_l3_cache.part.0+0x6a/0x110 > > [ 0.904268] cpuid4_cache_lookup_regs+0xcf/0x2a0 > > [ 0.904268] populate_cache_leaves+0x6f/0x530 > > [ 0.904268] ? srso_return_thunk+0x5/0x5f > > [ 0.904268] ? dl_server_stop+0x2f/0x40 > > [ 0.904268] ? srso_return_thunk+0x5/0x5f > > [ 0.904268] detect_cache_attributes+0x97/0x330 > > [ 0.904268] ? __pfx_cacheinfo_cpu_online+0x10/0x10 > > [ 0.904268] cacheinfo_cpu_online+0x22/0x250 > > [ 0.904268] ? srso_return_thunk+0x5/0x5f > > [ 0.904268] ? __pfx_cacheinfo_cpu_online+0x10/0x10 > > [ 0.904268] cpuhp_invoke_callback+0x10f/0x480 > > [ 0.904268] ? try_to_wake_up+0x23b/0x540 > > [ 0.904268] cpuhp_thread_fun+0xd4/0x160 > > [ 0.904268] smpboot_thread_fn+0xdd/0x1f0 > > [ 0.904268] ? __pfx_smpboot_thread_fn+0x10/0x10 > > [ 0.904268] kthread+0xca/0xf0 > > [ 0.904268] ? __pfx_kthread+0x10/0x10 > > [ 0.904268] ret_from_fork+0x50/0x60 > > [ 0.904268] ? __pfx_kthread+0x10/0x10 > > [ 0.904268] ret_from_fork_asm+0x1a/0x30 > > [ 0.904268] </TASK> > > [ 0.904268] Modules linked in: > > [ 0.904268] CR2: 00000000000000c4 > > [ 0.904268] ---[ end trace 0000000000000000 ]--- > > [ 0.904268] RIP: 0010:pci_read_config_dword+0x9/0x40 > > [ 0.904268] Code: 00 00 e9 8a f9 57 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <8b> 87 c4 00 00 00 48 89 d1 83 f8 03 74 10 8b 47 38 48 8b 7f 10 89 > > [ 0.988792] RSP: 0018:ffffc9000012fcd8 EFLAGS: 00010246 > > [ 0.988792] RAX: 0000000000000000 RBX: ffff88800d296640 RCX: 000000000000003f > > [ 0.988792] RDX: ffffc9000012fce4 RSI: 00000000000001c4 RDI: 0000000000000000 > > [ 0.988792] RBP: ffffc9000012fd60 R08: 0000000000000040 R09: 0000000000000010 > > [ 0.992761] R10: ffff88800daa1eb0 R11: fffffffffff8dc6f R12: 0000000040000163 > > [ 0.992761] R13: ffffc9000012fd60 R14: 0000000000000000 R15: ffff88807d62fc90 > > [ 0.992761] FS: 0000000000000000(0000) GS:ffff88807d600000(0000) knlGS:0000000000000000 > > [ 0.996772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.996772] CR2: 00000000000000c4 CR3: 0000000002c1a000 CR4: 00000000003506f0 > > [ 0.996772] note: cpuhp/0[20] exited with irqs disabled > > [ 1.680874] tsc: Refined TSC clocksource calibration: 3700.028 MHz > > [ 1.683128] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6aaae08e541, max_idle_ns: 881590514464 ns > > [ 1.688137] clocksource: Switched to clocksource tsc > > > > > > Can you please share the guest parameters? > I was able to reproduce it. The patch below seems to fix the issue. There's a comment in the function that this code is not for virtualized environments. Also, the "L3 in Northbridge" design doesn't apply to Zen systems. I'll keep looking at this to get a better understanding. My first thought is that this was silently handled before, because the AMD_NB code operated on PCI IDs. And these wouldn't be exposed to guests, so the northbridge data structures wouldn't be initialized. Specifically, I think we now have a non-zero number of northbridges, since using the topology info rather than counting PCI devices. In any case, I think it's better to have explicit checks. Thanks, Yazen diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 392d09c936d6..93d993a6a1df 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -595,6 +595,12 @@ static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index) if (index < 3) return; + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) + return; + + if (cpu_feature_enabled(X86_FEATURE_ZEN)) + return; + node = topology_amd_node_id(smp_processor_id()); this_leaf->nb = node_to_amd_nb(node); if (this_leaf->nb && !this_leaf->nb->l3_cache.indices)