On 9/16/24 11:54 AM, Maxim Levitsky wrote: > Hi! > > We recently saw a failure in one of the aws VM instances that causes the following error during the guest boot: > > 0.480051] unchecked MSR access error: WRMSR to 0xc0000302 (tried to write 0x040000000000001f) at rIP: 0xffffffff96c093e2 (amd_pmu_cpu_reset.constprop.0+0x42/0x80) > > > I investigated the issue and I see that the hypervisor does expose PerfmonV2, but not the LBRv2 support: > > # cpuid -1 -l 0x80000022 > CPU: > Extended Performance Monitoring and Debugging (0x80000022): > AMD performance monitoring V2 = true > AMD LBR V2 = false > AMD LBR stack & PMC freezing = false > number of core perf ctrs = 0x5 (5) > number of LBR stack entries = 0x0 (0) > number of avail Northbridge perf ctrs = 0x0 (0) > number of available UMC PMCs = 0x0 (0) > active UMCs bitmask = 0x0 > > I also verified that I can write 0x1f to 0xc0000302 but not 0x040000000000001f: > > # wrmsr 0xc0000302 0x1f > # wrmsr 0xc0000302 0x040000000000001f > wrmsr: CPU 0 cannot set MSR 0xc0000302 to 0x040000000000001f > # > > The AMD's APM is not clear on what should happen if unsupported bits are attempted to be cleared > using this MSR. > > Also I noticed that amd_pmu_v2_handle_irq writes 0xffffffffffffffff to this msrs. > It has the following code: > > > WARN_ON(status > 0); > > /* Clear overflow and freeze bits */ > amd_pmu_ack_global_status(~status); > > > This implies that it is OK to set all bits in this MSR. > To share my data point on QEMU+KVM: I am not able to reproduce with the most recent QEMU (not AWS) + below patch. [PATCH v2 2/4] i386/cpu: Add PerfMonV2 feature bit https://lore.kernel.org/all/69905b486218f8287b9703d1a9001175d04c2f02.1723068946.git.babu.moger@xxxxxxx/ Both my VM and KVM are 6.10. vm# cpuid -1 -l 0x80000022 CPU: Extended Performance Monitoring and Debugging (0x80000022): AMD performance monitoring V2 = true AMD LBR V2 = false AMD LBR stack & PMC freezing = false number of core perf ctrs = 0x6 (6) number of LBR stack entries = 0x0 (0) number of avail Northbridge perf ctrs = 0x0 (0) number of available UMC PMCs = 0x0 (0) active UMCs bitmask = 0x0 Both writes are passed. vm# wrmsr 0xc0000302 0x1f vm# wrmsr 0xc0000302 0x040000000000001f Here is bcc output. Both writes are good. kvm# /usr/share/bcc/tools/trace -t -C 'kvm_pmu_set_msr "%x", retval' ... ... 4.748614 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0 10.97396 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0 Dongli Zhang