On 23/5/23 09:39, Sean Christopherson wrote:
On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
Prior to SEV-ES, KVM saved/restored host debug registers upon switching
to/from a VM. Changing those registers inside a running SEV VM
triggered #VMEXIT to KVM.
Please, please don't make it sound like some behavior is *the* one and only behavior.
*KVM* *chooses* to intercept debug register accesses. Though I would omit this
paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
a basic understanding of how KVM deals with DRs and #DBs.
Out of curiosity - is ARM similar in this regard, interceptions and stuff?
SEV-ES added encrypted state (ES) which uses an encrypted page
for the VM state (VMSA). The hardware saves/restores certain registers
on VMRUN/VMEXIT according to a swap type (A, B, C), see
"Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
volume 2.
The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
Please rewrite this to just state what the behavior is instead of "Type A" vs.
"Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
enough to justify obfuscating super simple concepts.
Actually, this feature really has nothing to do with Type A vs. Type B, since
that's purely about host state.
Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in
HOSTSA, then the host looses debug state because of the working feature.
I assume the switch from Type A to Type B is a
side effect, or an opportunistic optimization?
There is no switch. DR[67] were and are one type, and the other DRs were
not swapped and now are, but with a different Swap Type.
And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves
some explaining why is that.
Regardless, something like this is a lot easier for a human to read. It's easy
enough to find DebugSwap in the APM (literally took me longer to find my copy of
the PDF).
It is also easy to find "Swap Types" in the APM...
Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
But it does save/load DR6 and DR7.
and KVM cannot manually context switch guest DRs due the VMSA being
encrypted.
Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB.
(english question) What does "vectoring" mean here precisely? Handling?
Processing?
Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
Good one, thanks for the link.
Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.
guests, but a new feature is available, identified via
CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
support for swapping additional debug registers. DR[0-3] and
DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
is set.
Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
("NoNestedDataBp", "Processor ignores nested data breakpoints") is
supported by the SOC as otherwise a malicious SEV-ES guest can set up
data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.
Eliminate DR7 and #DB intercepts as:
- they are not needed when DebugSwap is supported;
"not needed" isn't sufficient justification. KVM doesn't strictly need to do a
lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
when NoNestedDataBp is support. Presumably the answer is "because it's simpler
than toggling #DB interception for guest_debug.
TBH I did not have a good answer but from the above I'd say we want to
disable #DB intercepts in a separate patch, just as you say below.
Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
separate patch? KVM can still inject #DBs for SEV-ES guests, no?
Sorry for my ignorance but what is the point of injecting #DB if there
is no way of changing the guest's DR7?
As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
- #VC for these intercepts is most likely not supported anyway and
kills the VM.
I don't see how this is relevant or helpful. What the guest may or may not do in
response to a #VC on a DR7 write has no bearing on KVM's behavior.
Readers of the patch may wonder of the chances of breaks guests.
Definitely helps me to realize there is likely to be some sort of
panic() in the guest if such intercept happens.
Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxx>
Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
---
...
@@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
hostsa->xss = host_xss;
+
+ /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
Can you fold the below somewhere before this patch, and then tweak this comment
to say something like:
/*
* If DebugSwap is enabled, debug registers are loaded but NOT saved by
* the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
* saves and loads debug registers (Type-A).
*/
Sure but...
[*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@xxxxxxxxxx/
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Mon, 22 May 2023 16:29:52 -0700
Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
about swap types
... I am missing the point here. You already wrote the patch below which
1) you are happy about 2) you can push out right away to the upstream.
Are you suggesting that I repost it?
Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69ae5e1b3120..1e0e9b08e491 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
{
/*
- * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
- * of which one step is to perform a VMLOAD. KVM performs the
- * corresponding VMSAVE in svm_prepare_guest_switch for both
- * traditional and SEV-ES guests.
When I see "traditional", I assume there was one single x86 KVM before
say 2010 which was slow, emulated a lot but worked exactly the same on
Intel and AMD. Which does not seem to ever be the case. May be say "SVM"
here?
+ * All host state for SEV-ES guests is categorized into three swap types
+ * based on how it is handled by hardware during a world switch:
+ *
+ * A: VMRUN: Host state saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * B: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * C: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state initialized to default(reset) values
+ *
+ * Manually save type-B state, i.e. state that is loaded by VMEXIT but
+ * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
+ * by common SVM code).
Like here - "common SVM"?
Thanks for the comments!
*/
-
- /* XCR0 is restored on VMEXIT, save the current host value */
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
-
- /* PKRU is restored on VMEXIT, save the current host value */
hostsa->pkru = read_pkru();
-
- /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
hostsa->xss = host_xss;
}
base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
--
Alexey