On 30/04/21 22:10, Sean Christopherson wrote:
On Thu, Apr 29, 2021, Paolo Bonzini wrote:
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 57fc4090031a..cf1b0b2099b0 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
data:
This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
CPUID. Bit 0 represents whether live migration of the guest is allowed.
+
When a guest is started, bit 0 will be 1 if the guest has encrypted
- memory and 0 if the guest does not have encrypted memory.
+ memory and 0 if the guest does not have encrypted memory. If the
+ guest is communicating page encryption status to the host using the
+ ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
+ allow live migration of the guest. The MSR is read-only if
+ ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
in any way shape or form. I can understand making it read-only or dropping
writes if it's not intercepted by userspace, but making it read-only for
non-encrypted guests makes it useful only for encrypted guests, which defeats
the purpose of genericizing the MSR.
Yeah, I see your point. On the other hand by making it unconditionally
writable we must implement the writability in KVM, because a read-only
implementation would not comply with the spec.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9c40be9235c..0c2524bbaa84 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
return 1;
+ /*
+ * This implementation is only good if userspace has *not*
+ * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
+ * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
+ * MSR filter in order to accept writes that change bit 0.
+ */
if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
return 1;
This behavior doesn't match the documentation.
a. The MSR is not read-only for legacy guests since they can write '0'.
b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
a guest with encrypted memory can write '1' regardless of whether userspace
has enabled KVM_FEATURE_HC_PAGE_STATUS.
Right, I should have said "not changeable" rather than "read-only".
c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
bit 0, but not clear it. This doesn't seem intentional?
It is intentional, clearing it would mean preserving the value in the
kernel so that userspace can read it.
So... I don't know, all in all having both the separate CPUID and the
userspace implementation reeks of overengineering. It should be either
of these:
- separate CPUID bit, MSR unconditionally writable and implemented in
KVM. Userspace is expected to ignore the MSR value for encrypted guests
unless KVM_FEATURE_HC_PAGE_STATUS is exposed. Userspace should respect
it even for unencrypted guests (not a migration-DoS vector, because
userspace can just not expose the feature).
- make it completely independent from migration, i.e. it's just a facet
of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It
would use CPUID bit as the encryption status bitmap and have no code at
all in KVM (userspace needs to set up the filter and implement everything).
At this point I very much prefer the latter, which is basically Ashish's
earlier patch.
Paolo
Why not simply drop writes? E.g.
if (data & ~KVM_MIGRATION_READY)
return 1;
break;
And then do "msr->data = 0;" in the read path. That's just as effective as
making the MSR read-only to force userspace to intercept the MSR if it wants to
do anything useful with the information, and it's easy to document.