On Mon, Feb 28, 2022 at 12:27 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote: > > On 27/2/2022 7:41 am, Jim Mattson wrote: > > AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some > > reserved bits are cleared, and some are not. Specifically, on > > Zen3/Milan, bits 19 and 42 are not cleared. > > Curiously, is there any additional documentation on what bits 19 and 42 are for? > And we only need this part of logic specifically for at least (guest cpu model) > Zen3. With the help of an older revision of the APM I found at https://www.ii.uib.no/~osvik/x86-64/24593.pdf, we can see that bit 19, on AMD as well as Intel, is the deprecated "Pin Control" bit. I believe bit 42 is new on Zen3/Milan, but aside from being useful for fixing erratum #1292, I don't have any idea what it does. Note that bits 40 and 41 were reserved bits before SVM was introduced, and should be treated as such for VMs that do not support SVM. Hence, the motivation for this change is still, as previously mentioned, the egregious behavior of the Intel perf subsystem with respect to the Host-Only bit. This is necessary for all AMD vCPUs that do not support SVM, regardless of model. > > > > When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should > > If KVM chooses to emulate different #GP behavior on AMD and Intel for > "reserved bits without qualification"[0], there should be more code for almost > all MSRs to be checked one by one. I think you are manufacturing a problem that doesn't exist. > [0] "If a field is marked reserved without qualification, software must not > change the state of that field; it must reload that field with the same value > returned from a prior read." Unfortunately, some software (e.g. Linux perf) ignores this restriction. If, in spite of its misbehavior, the software works fine on bare metal, we should do whatever is necessary to make it work in a VM as well. > > not be passed through to the hardware MSR. So, rather than checking > > for reserved bits and synthesizing a #GP, just clear the reserved > > bits. > > wrmsr -a 0xc0010200 0xfffffcf000280000 > rdmsr -a 0xc0010200 | sort | uniq > # 0x40000080000 (expected) > > According to the test, there will be memory bits somewhere on the host > recording the bit status of bits 19 and 42. > > Shouldn't KVM emulate this bit-memory behavior as well ? I'm happy to revert your change that added bit 19 to the reserved bits. I can remove bit 42 as well, but I don't see the need. Bit 42, unlike bit 19, has never been documented. > > > > This may seem pedantic, but since KVM currently does not support the > > "Host/Guest Only" bits (41:40), it is necessary to clear these bits > > I would have thought you had code to emulate the "Host/Guest Only" > bits for nested SVM PMU to fix this issue fundamentally. GCP doesn't support nested SVM at all, so we have no such code. Regardless, as you can see from the old APM referenced above, these bits were reserved on AMD CPUs that don't support SVM. They should also be reserved on virtual CPUs that don't support SVM. That much, at least, KVM gets right today. > > rather than synthesizing #GP, because some popular guests (e.g Linux) > > will set the "Host Only" bit even on CPUs that don't support > > EFER.SVME, and they don't expect a #GP. > > IMO, this fix is just a reprieve. > > The logic of special handling of #GP only for AMD PMU MSR's > "reserved without qualification" bits is asymmetric in the KVM/svm > context and will confuse users even more. I'm happy to entertain alternative suggestions.