Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 20 Jun 2012 00:40:07 -0400

On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall
<c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity <avi@xxxxxxxxxx> wrote:
>> On 06/19/2012 01:05 AM, Christoffer Dall wrote:
>>>> Premature, but this is sad.  I suggest you split vmid generation from
>>>> next available vmid.  This allows you to make the generation counter
>>>> atomic so it may be read outside the lock.
>>>>
>>>> You can do
>>>>
>>>>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>>>>           return;
>>>>
>>>>    spin_lock(...
>>>>
>>>
>>> I knew you were going to say something here :), please take a look at
>>> this and see if you agree:
>>
>> It looks reasonable wrt locking.
>>
>>> +
>>> +     /* First user of a new VMID generation? */
>>> +     if (unlikely(kvm_next_vmid == 0)) {
>>> +             atomic64_inc(&kvm_vmid_gen);
>>> +             kvm_next_vmid = 1;
>>> +
>>> +             /* This does nothing on UP */
>>> +             smp_call_function(reset_vm_context, NULL, 1);
>>
>> Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().
>>
>
> yes, it implies a memory barrier.
>
>>> +
>>> +             /*
>>> +              * On SMP we know no other CPUs can use this CPU's or
>>> +              * each other's VMID since the kvm_vmid_lock blocks
>>> +              * them from reentry to the guest.
>>> +              */
>>> +
>>> +             reset_vm_context(NULL);
>>
>> These two lines can be folded as on_each_cpu().
>>
>> Don't you have a race here where you can increment the generation just
>> before guest entry?
>
> I don't think I do.
>

uh, strike that, I do.

>>
>> cpu0                       cpu1
>> (vmid=0, gen=1)            (gen=0)
>> -------------------------- ----------------------
>> gen == global_gen, return
>>
>>                           gen != global_gen
>>                           increment global_gen
>>                           smp_call_function
>> reset_vm_context
>>                           vmid=0
>>
>> enter with vmid=0          enter with vmid=0
>
> I can't see how the scenario above can happen. First, no-one can run
> with vmid 0 - it is reserved for the host. If we bump global_gen on
> cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
> that after this call, all cpus(N-1) potentially being inside a VM will
> have exited, called reset_vm_context, but before they can re-enter
> into the guest, they will call update_vttbr, and if their generation
> counter differs from global_gen, they will try to grab that spinlock
> and everything should happen in order.
>

the whole vmid=0 confused me a bit. The point is since we moved the
generation check outside the spin_lock we have to re-check, I see:

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 19fe3b0..74760af 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,6 +313,24 @@ static void reset_vm_context(void *info)
 }

 /**
+ * check_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool check_new_vmid_gen(struct kvm *kvm)
+{
+	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+		return;
+}
+
+/**
  * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
  * @kvm	The guest that we are about to run
  *
@@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
 {
 	phys_addr_t pgd_phys;

-	/*
-	 *  Check that the VMID is still valid.
-	 *  (The hardware supports only 256 values with the value zero
-	 *   reserved for the host, so we check if an assigned value belongs to
-	 *   a previous generation, which which requires us to assign a new
-	 *   value. If we're the first to use a VMID for the new generation,
-	 *   we must flush necessary caches and TLBs on all CPUs.)
-	 */
-	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+	if (!check_new_vmid_gen(kvm))
 		return;

 	spin_lock(&kvm_vmid_lock);
@@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 */
 		preempt_disable();
 		local_irq_disable();
+
+		if (check_new_vmid_gen(kvm)) {
+			local_irq_enable();
+			preempt_enable();
+			continue;
+		}
+
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;

>>
>> You must recheck gen after disabling interrupts to ensure global_gen
>> didn't bump after update_vttbr but before entry.  x86 has a lot of this,
>> see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to
>> your case but may come in useful later).
>>
>>>
>>>>> +
>>>>> +/**
>>>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>>>> + * @vcpu:    The VCPU pointer
>>>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>>>> + *
>>>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>>>> + * will execute VM code in a loop until the time slice for the process is used
>>>>> + * or some emulation is needed from user space in which case the function will
>>>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>>>> + * required data for the requested emulation.
>>>>> + */
>>>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  {
>>>>> -     return -EINVAL;
>>>>> +     int ret = 0;
>>>>> +     sigset_t sigsaved;
>>>>> +
>>>>> +     if (vcpu->sigset_active)
>>>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>>>> +
>>>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>>>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>>>>
>>>> It's not a good idea to read stuff from run unless it's part of the ABI,
>>>> since userspace can play with it while you're reading it.  It's harmless
>>>> here but in other places it can lead to a vulnerability.
>>>>
>>>
>>> ok, so in this case, userspace can 'suppress' an MMIO or interrupt
>>> window operation.
>>>
>>> I can change to keep some local variable to maintain the state and
>>> have some API convention for emulation functions, if you feel strongly
>>> about it, but otherwise it feels to me like the code will be more
>>> complicated. Do you have other ideas?
>>
>> x86 uses:
>>
>>  0 - return to userspace (run prepared)
>>  1 - return to guest (run untouched)
>>  -ESOMETHING - return to userspace
>>
>
> yeah, we can do that I guess, that's fair.
>
>> as return values from handlers and for locals (usually named 'r').
>>
>>
>> --
>> error compiling committee.c: too many arguments to function
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html