On Mon, Nov 19, 2012 at 9:57 AM, Will Deacon <will.deacon@xxxxxxx> wrote: > On Sat, Nov 10, 2012 at 03:43:06PM +0000, Christoffer Dall wrote: >> Provides complete world-switch implementation to switch to other guests >> running in non-secure modes. Includes Hyp exception handlers that >> capture necessary exception information and stores the information on >> the VCPU and KVM structures. > > [...] > >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index 15e2ab1..d8f8c60 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -40,6 +40,7 @@ >> #include <asm/kvm_arm.h> >> #include <asm/kvm_asm.h> >> #include <asm/kvm_mmu.h> >> +#include <asm/kvm_emulate.h> >> >> #ifdef REQUIRES_VIRT >> __asm__(".arch_extension virt"); >> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); >> static struct vfp_hard_struct __percpu *kvm_host_vfp_state; >> static unsigned long hyp_default_vectors; >> >> +/* The VMID used in the VTTBR */ >> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); >> +static u8 kvm_next_vmid; >> +static DEFINE_SPINLOCK(kvm_vmid_lock); >> >> int kvm_arch_hardware_enable(void *garbage) >> { >> @@ -264,6 +269,8 @@ int __attribute_const__ kvm_target_cpu(void) >> >> int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) >> { >> + /* Force users to call KVM_ARM_VCPU_INIT */ >> + vcpu->arch.target = -1; >> return 0; >> } >> >> @@ -274,6 +281,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) >> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >> { >> vcpu->cpu = cpu; >> + vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state); >> } >> >> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) >> @@ -306,12 +314,168 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) >> >> int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v) >> { >> + return v->mode == IN_GUEST_MODE; >> +} >> + >> +static void reset_vm_context(void *info) >> +{ >> + kvm_call_hyp(__kvm_flush_vm_context); >> +} >> + >> +/** >> + * need_new_vmid_gen - check that the VMID is still valid >> + * @kvm: The VM's VMID to checkt >> + * >> + * return true if there is a new generation of VMIDs being used >> + * >> + * The hardware supports only 256 values with the value zero reserved for the >> + * host, so we check if an assigned value belongs to a previous generation, >> + * which which requires us to assign a new value. If we're the first to use a >> + * VMID for the new generation, we must flush necessary caches and TLBs on all >> + * CPUs. >> + */ >> +static bool need_new_vmid_gen(struct kvm *kvm) >> +{ >> + return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen)); >> +} >> + >> +/** >> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs >> + * @kvm The guest that we are about to run >> + * >> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the >> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding >> + * caches and TLBs. >> + */ >> +static void update_vttbr(struct kvm *kvm) >> +{ >> + phys_addr_t pgd_phys; >> + u64 vmid; >> + >> + if (!need_new_vmid_gen(kvm)) >> + return; >> + >> + spin_lock(&kvm_vmid_lock); >> + >> + /* First user of a new VMID generation? */ >> + if (unlikely(kvm_next_vmid == 0)) { >> + atomic64_inc(&kvm_vmid_gen); >> + kvm_next_vmid = 1; >> + >> + /* >> + * On SMP we know no other CPUs can use this CPU's or >> + * each other's VMID since the kvm_vmid_lock blocks >> + * them from reentry to the guest. >> + */ >> + on_each_cpu(reset_vm_context, NULL, 1); >> + } >> + >> + kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen); >> + kvm->arch.vmid = kvm_next_vmid; >> + kvm_next_vmid++; >> + >> + /* update vttbr to be used with the new vmid */ >> + pgd_phys = virt_to_phys(kvm->arch.pgd); >> + vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK; >> + kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK; >> + kvm->arch.vttbr |= vmid; >> + >> + spin_unlock(&kvm_vmid_lock); >> +} > > I must be missing something here: how do you ensure that a guest running > on multiple CPUs continues to have the same VMID across them after a > rollover? > when a roll over occurs, there's no problem until someone comes along that doesn't have a valid vmid (need_new_vmid_gen will return true). In this case, to assign a vmid, we need to start a new generation of id's to assign one, and must ensure that all old vmid's are no longer used. So how do we ensure that? Well, we increment the kvm_vmid_gen, causing all other cpus who try to run a VM to hit the spin_lock if they exit the VMs. We reserve the vmid 1 for the new cpu, and we call on_each_cpu, which causes an ipi to all other physical cpus, and waits until the other physical cpus actually complete reset_vm_context. At this point, once on_each_cpu(reset_vm_context) returns, all other physical CPUs have cleared their data structures for occurences of old vmids, and the kvm_vmid_gen has been incremented, so no other vcpus can come and claim other vmids until we unlock the spinlock, and everything starts over. Makes sense? >> + >> +/* >> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on >> + * proper exit to QEMU. >> + */ >> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, >> + int exception_index) >> +{ >> + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; >> return 0; >> } >> >> +/** >> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code >> + * @vcpu: The VCPU pointer >> + * @run: The kvm_run structure pointer used for userspace state exchange >> + * >> + * This function is called through the VCPU_RUN ioctl called from user space. It >> + * will execute VM code in a loop until the time slice for the process is used >> + * or some emulation is needed from user space in which case the function will >> + * return with return value 0 and with the kvm_run structure filled in with the >> + * required data for the requested emulation. >> + */ >> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >> { >> - return -EINVAL; >> + int ret; >> + sigset_t sigsaved; >> + >> + /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */ >> + if (unlikely(vcpu->arch.target < 0)) >> + return -ENOEXEC; >> + >> + if (vcpu->sigset_active) >> + sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); >> + >> + ret = 1; >> + run->exit_reason = KVM_EXIT_UNKNOWN; >> + while (ret > 0) { >> + /* >> + * Check conditions before entering the guest >> + */ >> + cond_resched(); >> + >> + update_vttbr(vcpu->kvm); >> + >> + local_irq_disable(); >> + >> + /* >> + * Re-check atomic conditions >> + */ >> + if (signal_pending(current)) { >> + ret = -EINTR; >> + run->exit_reason = KVM_EXIT_INTR; >> + } >> + >> + if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) { >> + local_irq_enable(); >> + continue; >> + } > > I don't see what the VMID generation check buys you here as you're not > holding the vmid lock, so rollover can occur at any time. If rollover > *does* occur, you'll get signalled when re-enabling interrupts during > guest entry, right? > > Is it because your rollover handling doesn't actually update the vttbr > directly and relies on guest sched in to do it? If so, this all feels > pretty racy to me. > The point is that if we didn't do this re-check after local_irq_disable, we could receive the ipi from on_each_cpu in update_vttbr after the update_vttbr and before local_irq_disable, and even though that would clear CPU data structures, we would still use the old vmid. Once we enter the guest, the IPI will force a complete round-trip and thus a second call to update_vttbr, which would then cause us to assign a new vmid in the new generation. We ensure this doesn't happen by explicitly forcing a re-check of the vmid generation. In other words, there's a hole between spin_unlock and local_irq_disable. See this previous discussion of the matter: https://lists.cs.columbia.edu/pipermail/kvmarm/2012-June/002452.html -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html