Re: [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1

Avi Kivity <avi@xxxxxxxxxx> · Mon, 14 Jun 2010 11:33:33 +0300

On 06/13/2010 03:25 PM, Nadav Har'El wrote:
An implementation of VMX needs to define a VMCS structure. This structure
is kept in guest memory, but is opaque to the guest (who can only read or
write it with VMX instructions).

This patch starts to define the VMCS structure which our nested VMX
implementation will present to L1. We call it "vmcs12", as it is the VMCS
that L1 keeps for its L2 guests.

This patch also adds the notion (as required by the VMX spec) of the "current
VMCS", and finally includes utility functions for mapping the guest-allocated
VMCSs in host memory.

+#define VMCS12_REVISION 0x11e57ed0

Where did this number come from?  It's not from real hardware, yes?

+
+/*
+ * struct vmcs12 describes the state that our guest hypervisor (L1) keeps for a
+ * single nested guest (L2), hence the name vmcs12. Any VMX implementation has
+ * a VMCS structure (which is opaque to the guest), and vmcs12 is our emulated
+ * VMX's VMCS. This structure is stored in guest memory specified by VMPTRLD,
+ * and accessed by the guest using VMREAD/VMWRITE/VMCLEAR instructions. More
+ * than one of these structures may exist, if L1 runs multiple L2 guests.
+ * nested_vmx_run() will use the data here to build a VMCS for the underlying
+ * hardware which will be used to run L2.
+ * This structure is packed in order to preseve the binary content after live
+ * migration. If there are changes in the content or layout, VMCS12_REVISION
+ * must be changed.
+ */
+struct __attribute__ ((__packed__)) vmcs12 {

__packed is a convenient define for this.

+	/* According to the Intel spec, a VMCS region must start with the
+	 * following two fields. Then follow implementation-specific data.
+	 */
+	u32 revision_id;
+	u32 abort;
+};

Note that this structure becomes an ABI, it cannot change except in a 
backward compatible way due to the need for live migration.  So I'd like 
a documentation patch that adds a description of the content to 
Documentation/kvm/.  It can be as simple as listing the structure 
definition.

+static struct page *nested_get_page(struct kvm_vcpu *vcpu, u64 vmcs_addr)
+{
+	struct page *vmcs_page =
+		gfn_to_page(vcpu->kvm, vmcs_addr>>  PAGE_SHIFT);
+
+	if (is_error_page(vmcs_page)) {
+		printk(KERN_ERR "%s error allocating page 0x%llx\n",
+		       __func__, vmcs_addr);

Those printks can be used by a malicious guest to span the host logs.  
Please wrap them with something that is conditional on a debug flag.

I'm not sure what we need to do with vmcs that is not in RAM.  It may 
simplify things to return the error_page to the caller and set 
KVM_REQ_TRIPLE_FAULT, so we don't have to deal with error handling later on.

+		kvm_release_page_clean(vmcs_page);
+		return NULL;
+	}
+	return vmcs_page;
+}
+
+static void nested_unmap_current(struct kvm_vcpu *vcpu)
+{
+	struct page *page;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (!vmx->nested.current_l2_page) {
+		printk(KERN_INFO "Shadow vmcs already unmapped\n");
+		BUG_ON(1);
+		return;
+	}
+
+	page = kmap_atomic_to_page(vmx->nested.current_l2_page);
+
+	kunmap_atomic(vmx->nested.current_l2_page, KM_USER0);
+
+	kvm_release_page_dirty(page);

Do we always dirty the page?

I guess it is no big deal even if we don't.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html