Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 18, 2012 at 9:41 AM, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 06/15/2012 10:08 PM, Christoffer Dall wrote:
>> Provides complete world-switch implementation to switch to other guests
>> running in non-secure modes. Includes Hyp exception handlers that
>> capture necessary exception information and stores the information on
>> the VCPU and KVM structures.
>>
>> Switching to Hyp mode is done through a simple HVC instructions. The
>> exception vector code will check that the HVC comes from VMID==0 and if
>> so will store the necessary state on the Hyp stack, which will look like
>> this (growing downwards, see the hyp_hvc handler):
>>   ...
>>   Hyp_Sp + 4: spsr (Host-SVC cpsr)
>>   Hyp_Sp    : lr_usr
>>
>> When returning from Hyp mode to SVC mode, another HVC instruction is
>> executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
>> stack pointer should be where it was left from the above initial call,
>> since the values on the stack will be used to restore state (see
>> hyp_svc).
>>
>> Otherwise, the world-switch is pretty straight-forward. All state that
>> can be modified by the guest is first backed up on the Hyp stack and the
>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>> theoretically modifiable by the guest is protected through the
>> virtualiation features to generate a trap and cause software emulation.
>> Upon guest returns, all state is restored from hardware onto the VCPU
>> struct and the original state is restored from the Hyp-stack onto the
>> hardware.
>>
>> One controversy may be the back-door call to __irq_svc (the host
>> kernel's own physical IRQ handler) which is called when a physical IRQ
>> exception is taken in Hyp mode while running in the guest.
>>
>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>
>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>> a separate patch into the appropriate patches introducing the
>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>> architecture reference manual.
>>
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm      The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +     phys_addr_t pgd_phys;
>> +
>> +     spin_lock(&kvm_vmid_lock);
>
> Premature, but this is sad.  I suggest you split vmid generation from
> next available vmid.  This allows you to make the generation counter
> atomic so it may be read outside the lock.
>
> You can do
>
>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>           return;
>
>    spin_lock(...
>

I knew you were going to say something here :), please take a look at
this and see if you agree:

+struct kvm_arch {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	struct mutex pgd_mutex;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};

[snip]

+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+DEFINE_SPINLOCK(kvm_vmid_lock);

[snip]

+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	/*
+	 *  Check that the VMID is still valid.
+	 *  (The hardware supports only 256 values with the value zero
+	 *   reserved for the host, so we check if an assigned value belongs to
+	 *   a previous generation, which which requires us to assign a new
+	 *   value. If we're the first to use a VMID for the new generation,
+	 *   we must flush necessary caches and TLBs on all CPUs.)
+	 */
+	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/* This does nothing on UP */
+		smp_call_function(reset_vm_context, NULL, 1);
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or
+		 * each other's VMID since the kvm_vmid_lock blocks
+		 * them from reentry to the guest.
+		 */
+
+		reset_vm_context(NULL);
+	}
+
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+			  & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
+
+	spin_unlock(&kvm_vmid_lock);
+}


>> +
>> +     /*
>> +      *  Check that the VMID is still valid.
>> +      *  (The hardware supports only 256 values with the value zero
>> +      *   reserved for the host, so we check if an assigned value has rolled
>> +      *   over a sequence, which requires us to assign a new value and flush
>> +      *   necessary caches and TLBs on all CPUs.)
>> +      */
>> +     if (unlikely((kvm->arch.vmid ^ next_vmid) >> VMID_BITS)) {
>> +             /* Check for a new VMID generation */
>> +             if (unlikely((next_vmid & VMID_MASK) == 0)) {
>> +                     /* Check for the (very unlikely) 64-bit wrap around */
>
> Unlikely?  it's impossible.
>

well, if it happens every microsecond on a really fast processor, it
would occur a little less than every 500,000 years :)  Ok, I'll remove
the check.

>> +
>> +/**
>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>> + * @vcpu:    The VCPU pointer
>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>> + *
>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>> + * will execute VM code in a loop until the time slice for the process is used
>> + * or some emulation is needed from user space in which case the function will
>> + * return with return value 0 and with the kvm_run structure filled in with the
>> + * required data for the requested emulation.
>> + */
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -     return -EINVAL;
>> +     int ret = 0;
>> +     sigset_t sigsaved;
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>> +
>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>
> It's not a good idea to read stuff from run unless it's part of the ABI,
> since userspace can play with it while you're reading it.  It's harmless
> here but in other places it can lead to a vulnerability.
>

ok, so in this case, userspace can 'suppress' an MMIO or interrupt
window operation.

I can change to keep some local variable to maintain the state and
have some API convention for emulation functions, if you feel strongly
about it, but otherwise it feels to me like the code will be more
complicated. Do you have other ideas?


>> +             /*
>> +              * Check conditions before entering the guest
>> +              */
>> +             if (need_resched())
>> +                     kvm_resched(vcpu);
>
> I think cond_resched() is all that's needed these days (kvm_resched()
> predates preempt notifiers).
>

ok

>> +
>> +             if (signal_pending(current)) {
>> +                     ret = -EINTR;
>> +                     run->exit_reason = KVM_EXIT_INTR;
>> +                     break;
>> +             }
>> +
>> +             /*
>> +              * Enter the guest
>> +              */
>> +             trace_kvm_entry(vcpu->arch.regs.pc);
>> +
>> +             update_vttbr(vcpu->kvm);
>> +
>> +             local_irq_disable();
>> +             kvm_guest_enter();
>> +             vcpu->mode = IN_GUEST_MODE;
>> +
>> +             ret = __kvm_vcpu_run(vcpu);
>> +
>> +             vcpu->mode = OUTSIDE_GUEST_MODE;
>> +             kvm_guest_exit();
>> +             local_irq_enable();
>> +
>> +             trace_kvm_exit(vcpu->arch.regs.pc);
>> +     }
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>> +
>> +     return ret;
>>  }
>>


Thanks,
Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux