Re: [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 26, 2024 at 2:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> On Thu, Mar 07, 2024, David Matlack wrote:
> >
> > -     if (current->on_rq) {
> > +     if (current->on_rq && vcpu->wants_to_run) {
> >               WRITE_ONCE(vcpu->preempted, true);
> >               WRITE_ONCE(vcpu->ready, true);
> >       }
>
> Long story short, I was playing around with wants_to_run for a few hairbrained
> ideas, and realized that there's a TOCTOU bug here.  Userspace can toggle
> run->immediate_exit at will, e.g. can clear it after the kernel loads it to
> compute vcpu->wants_to_run.
>
> That's not fatal for this use case, since userspace would only be shooting itself
> in the foot, but it leaves a very dangerous landmine, e.g. if something else in
> KVM keys off of vcpu->wants_to_run to detect that KVM is in its run loop, i.e.
> relies on wants_to_run being set if KVM is in its core run loop.
>
> To address that, I think we should have all architectures check wants_to_run, not
> immediate_exit.

Rephrasing to make sure I understand you correctly: It's possible for
KVM to see conflicting values of wants_to_run and immediate_exit,
since userspace can change immediate_exit at any point. e.g. It's
possible for KVM to see wants_to_run=true and immediate_exit=true,
which wouldn't make sense. This wouldn't cause any bugs today, but
could result in buggy behavior down the road so we might as well clean
it up now.

> And loading immediate_exit needs to be done with READ_ONCE().

+1, good point

>
> E.g. for x86 (every other arch has similar code)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e9ef1fa4b90b..1a2f6bf14fb2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11396,7 +11396,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>
>         kvm_vcpu_srcu_read_lock(vcpu);
>         if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
> -               if (kvm_run->immediate_exit) {
> +               if (!vcpu->wants_to_run) {
>                         r = -EINTR;
>                         goto out;
>                 }
> @@ -11474,7 +11474,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>                 WARN_ON_ONCE(vcpu->mmio_needed);
>         }
>
> -       if (kvm_run->immediate_exit) {
> +       if (!vcpu->wants_to_run) {
>                 r = -EINTR;
>                 goto out;
>         }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f9b9ce0c3cd9..0c0aae224000 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1497,9 +1497,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>                                         struct kvm_guest_debug *dbg);
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
>
> -void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
> -
> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool sched_in);
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id);
>  int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9501fbd5dfd2..4384bbdba65c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4410,7 +4410,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
>                                 synchronize_rcu();
>                         put_pid(oldpid);
>                 }
> -               vcpu->wants_to_run = !vcpu->run->immediate_exit;
> +               vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit);
>                 r = kvm_arch_vcpu_ioctl_run(vcpu);
>                 vcpu->wants_to_run = false;
>
>
> ---
>
> Hmm, and we should probably go a step further and actively prevent using
> immediate_exit from the kernel, e.g. rename it to something scary like:
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..9c5fe1dae744 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -196,7 +196,11 @@ struct kvm_xen_exit {
>  struct kvm_run {
>         /* in */
>         __u8 request_interrupt_window;
> +#ifndef __KERNEL__
>         __u8 immediate_exit;
> +#else
> +       __u8 hidden_do_not_touch;
> +#endif

This would result in:

  vcpu->wants_to_run = !READ_ONCE(vcpu->run->hidden_do_not_touch);

:)

Of course we could pick a better name... but isn't every field in
kvm_run open to TOCTOU issues? (Is immediate_exit really special
enough to need this protection?)





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux