On 19/1/2022 2:22 am, Jim Mattson wrote:
On Mon, Jan 17, 2022 at 10:25 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
On 18/1/2022 12:08 pm, Jim Mattson wrote:
On Mon, Jan 17, 2022 at 12:57 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
On Sun, Jan 16, 2022 at 8:26 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
...
It's easy for KVM to clear the reserved bit PERF_CTL2[43]
for only (AMD Family 19h Models 00h-0Fh) guests.
KVM is currently *way* too aggressive about synthesizing #GP for
"reserved" bits on AMD hardware. Note that "reserved" generally has a
much weaker definition in AMD documentation than in Intel
documentation. When Intel says that an MSR bit is "reserved," it means
that an attempt to set the bit will raise #GP. When AMD says that an
MSR bit is "reserved," it does not necessarily mean the same thing.
I agree. And I'm curious as to why there are hardly any guest user complaints.
The term "reserved" is described in the AMD "Conventions and Definitions":
Fields marked as reserved may be used at some future time.
To preserve compatibility with future processors, reserved fields require
special handling when
read or written by software. Software must not depend on the state of a
reserved field (unless
qualified as RAZ), nor upon the ability of such fields to return a previously
written state.
If a field is marked reserved *without qualification*, software must not change
the state of
that field; it must reload that field with the same value returned from a prior
read.
Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ.
For AMD, #GP comes from "Writing 1 to any bit that must be zero (MBZ) in the MSR."
(Usually, AMD will write MBZ to indicate that the bit must be zero.)
On my Zen3 CPU, I can write 0xffffffffffffffff to MSR 0xc0010204,
without getting a #GP. Hence, KVM should not synthesize a #GP for any
writes to this MSR.
; storage behind bit 43 test
; CPU family: 25
; Model: 1
wrmsr -p 0 0xc0010204 0x80000000000
rdmsr -p 0 0xc0010204 # return 0x80000000000
Oops. You're right. The host that I thought was a Zen3 was actually a
Zen2. Switching to an actual Zen3, I find that there is storage behind
bits 42 and 43, both of which are indicated as reserved.
Note that the value I get back from rdmsr is 0x30fffdfffff, so there
appears to be no storage behind bit 43. If KVM allows this bit to be
set, it should ensure that reads of this bit always return 0, as they
do on hardware.
The PERF_CTL2[43] is marked reserved without qualification in the in Figure 13-7.
I'm not sure we really need a cleanup storm of #GP for all SVM's non-MBZ
reserved bits.
OTOH, we wouldn't need to have this discussion if these MSRs had been
implemented correctly to begin with.
So should KVM remove all #GP for AMD's non-MBZ reserved bits?
Not a small amount of work, plus almost none guest user complaints.
Bit 19 (Intel's old Pin Control bit) seems to have storage behind it.
It is interesting that in Figure 13-7 "Core Performance Event-Select
Register (PerfEvtSeln)" of the APM volume 2, this "reserved" bit is
not marked in grey. The remaining "reserved" bits (which are marked in
grey), should probably be annotated with "RAZ."
In any diagram, we at least have three types of "reservation":
- Reserved + grey
- Reserved, MBZ + grey
- Reserved + no grey
So it is better not to think of "Reserved + grey" as "Reserved, MBZ + grey".
Right. None of these bits MBZ. I was observing that the grey fields
RAZ. However, that observation was on Zen2. Zen3 is different. Now,
it's not clear to me what the grey highlights mean. Perhaps nothing at
all.
Anyway, does this fix [0] help with this issue, assuming AMD guys would come
up with a workaround for the host perf scheduler as usual ?
[0] https://lore.kernel.org/kvm/20220117055703.52020-1-likexu@xxxxxxxxxxx/