On Thu, Aug 23, 2012 at 3:20 AM, Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote: > Ben Herrenschmidt pointed out a nasty race: > (1) Guest complicated mmio instruction traps. > (2) The hardware doesn't tell us enough, so we need to read the actual > instruction which was being exectuted. > (3) KVM maps the instruction virtual address to a physical address. > (4) The guest (SMP) swaps out that page, and fills it with something else. > (5) We read the physical address, but now that's the wrong thing. > > This is so rare that we can stop all the vcpus from running when it > happens (if no VCPUS are running, this just means getting and > releasing a spinlock). > > Signed-off-by: Rusty Russell <rusty.russell@xxxxxxxxxx> > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 79b5318..1c0fa75 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -144,6 +144,9 @@ struct kvm_vcpu_arch { > int last_pcpu; > cpumask_t require_dcache_flush; > > + /* Don't run the guest: see copy_current_insn() */ > + bool pause; > + > /* IO related fields */ > bool mmio_sign_extend; /* for byte/halfword loads */ > u32 mmio_rd; > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 09a6800..404f4d4 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -614,7 +614,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > kvm_guest_enter(); > vcpu->mode = IN_GUEST_MODE; > > - ret = __kvm_vcpu_run(vcpu); > + smp_mb(); /* set mode before reading vcpu->arch.pause */ > + if (unlikely(vcpu->arch.pause)) { > + /* This means ignore, try again. */ > + ret = ARM_EXCEPTION_IRQ; > + } else { > + ret = __kvm_vcpu_run(vcpu); > + } it makes me a little sad to wrap this important "I'm going to change the world and run a VM function call" in the else-clause of a super rare corner case, but I see it gets the job done nicely. > > vcpu->mode = OUTSIDE_GUEST_MODE; > vcpu->arch.last_pcpu = smp_processor_id(); > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 0043dbb..16256e9 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -636,6 +636,58 @@ static bool copy_from_guest_va(struct kvm_vcpu *vcpu, > return true; > } > > +/* Just ensure we're not running the guest. */ > +static void do_nothing(void *info) > +{ > +} > + > +/* > + * We have to be very careful copying memory from a running (ie. SMP) guest. > + * Another CPU may remap the page (eg. swap out a userspace text page) as we > + * read the instruction. Unlike normal hardware operation, to emulate an > + * instruction we map the virtual to physical address then read that memory > + * as separate steps, thus not atomic. > + * > + * Fortunately this is so rare (we don't usually need the instruction), we > + * can go very slowly and noone will mind. > + */ > +static bool copy_current_insn(struct kvm_vcpu *vcpu, > + unsigned long *instr, size_t instr_len) > +{ > + int i; > + bool ret; > + struct kvm_vcpu *v; > + > + /* Don't cross with IPIs in kvm_main.c */ > + spin_lock(&vcpu->kvm->mmu_lock); > + > + /* Tell them all to pause, so no more will enter guest. */ > + kvm_for_each_vcpu(i, v, vcpu->kvm) > + v->arch.pause = true; > + > + /* Set ->pause before we read ->mode */ > + smp_mb(); > + > + /* Kick out any which are still running. */ > + kvm_for_each_vcpu(i, v, vcpu->kvm) { > + /* Guest could exit now, making cpu wrong. That's OK. */ > + if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) > + smp_call_function_single(v->cpu, do_nothing, NULL, 1); > + } > + > + /* Now guest isn't running, we can va->pa map and copy atomically. */ > + ret = copy_from_guest_va(vcpu, instr, vcpu->arch.regs.pc, instr_len, > + vcpu_mode_priv(vcpu)); > + > + /* Release them all. */ > + kvm_for_each_vcpu(i, v, vcpu->kvm) > + v->arch.pause = false; > + > + spin_unlock(&vcpu->kvm->mmu_lock); > + > + return ret; > +} > + > /** > * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear > * > @@ -660,8 +712,7 @@ static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) > } > > /* If it fails (SMP race?), we reenter guest for it to retry. */ > - if (!copy_from_guest_va(vcpu, &instr, vcpu->arch.regs.pc, sizeof(instr), > - vcpu_mode_priv(vcpu))) > + if (!copy_current_insn(vcpu, &instr, sizeof(instr))) > return 1; > > return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr); otherwise looks good. we should have a trace event for when this race ever happens and have a big party when it does! -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm