Re: [PATCH v4 03/16] KVM: Add KVM_CAP_MEMORY_FAULT_INFO

Anish Moorthy <amoorthy@xxxxxxxxxx> · Mon, 10 Jul 2023 08:00:00 -0700

> > > +   preempt_disable();
> > > +   /*
> > > +    * Ensure the this vCPU isn't modifying another vCPU's run struct, which
> > > +    * would open the door for races between concurrent calls to this
> > > +    * function.
> > > +    */
> > > +   if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu)))
> > > +           goto out;
> >
> > Meh, this is overkill IMO.  The check in mark_page_dirty_in_slot() is an
> > abomination that I wish didn't exist, not a pattern that should be copied.  If
> > we do keep this sanity check, it can simply be
> >
> >       if (WARN_ON_ONCE(vcpu != kvm_get_running_vcpu()))
> >               return;
> >
> > because as the comment for kvm_get_running_vcpu() explains, the returned vCPU
> > pointer won't change even if this task gets migrated to a different pCPU.  If
> > this code were doing something with vcpu->cpu then preemption would need to be
> > disabled throughout, but that's not the case.
> >
> I think that this check is needed but without the WARN_ON_ONCE as per my
> other comment.
> Reason is that we really need to insulate the code against preemption
> kicking in before the call to preempt_disable() as the logic seems to
> need this check to avoid concurrency problems.
> (I don't think Anish simply copied this if-check from mark_page_dirty_in_slot)

Heh, you're giving me too much credit here. I did copy-paste this
check, not from from mark_page_dirty_in_slot() but from one of Sean's
old responses [1]

> That said, I agree that there's a risk that KVM could clobber vcpu->run_run by
> hitting an -EFAULT without the vCPU loaded, but that's a solvable problem, e.g.
> the helper to fill KVM_EXIT_MEMORY_FAULT could be hardened to yell if called
> without the target vCPU being loaded:
>
>     int kvm_handle_efault(struct kvm_vcpu *vcpu, ...)
>     {
>         preempt_disable();
>         if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu)))
>             goto out;
>
>         vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
>         ...
>     out:
>         preempt_enable();
>         return -EFAULT;
>     }

Ancient history aside, let's figure out what's really needed here.

> Why use WARN_ON_ONCE when there is a clear possiblity of preemption
> kicking in (with the possibility of vcpu_load/vcpu_put being called
> in the new task) before preempt_disable() is called in this function ?
> I think you should use WARN_ON_ONCE only where there is some impossible
> or unhandled situation happening, not when there is a possibility of that
> situation clearly happening as per the kernel code.

I did some mucking around to try and understand the kvm_running_vcpu
variable, and I don't think preemption/rescheduling actually trips the
WARN here? From my (limited) understanding, it seems that the
thread being preempted will cause a vcpu_put() via kvm_sched_out().
But when the thread is eventually scheduled back in onto whatever
core, it'll vcpu_load() via kvm_sched_in(), and the docstring for
kvm_get_running_vcpu() seems to imply the thing that vcpu_load()
stores into the per-cpu "kvm_running_vcpu" variable will be the same
thing which would have been observed before preemption.

All that's to say: I wouldn't expect the value of
"__this_cpu_read(kvm_running_vcpu)" to change in any given thread. If
that's true, then the things I would expect this WARN to catch are (a)
bugs where somehow the thread gets scheduled without calling
vcpu_load() or (b) bizarre situations (probably all bugs?) where some
vcpu thread has a hold of some _other_ kvm_vcpu* and is trying to do
something with it.

On Wed, Jun 14, 2023 at 10:35 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> Meh, this is overkill IMO.  The check in mark_page_dirty_in_slot() is an
> abomination that I wish didn't exist, not a pattern that should be copied.  If
> we do keep this sanity check, it can simply be
>
>         if (WARN_ON_ONCE(vcpu != kvm_get_running_vcpu()))
>                 return;
>
> because as the comment for kvm_get_running_vcpu() explains, the returned vCPU
> pointer won't change even if this task gets migrated to a different pCPU.  If
> this code were doing something with vcpu->cpu then preemption would need to be
> disabled throughout, but that's not the case.

Oh, looks like Sean said the same thing. Guess I probably should have
read that reply more closely first :/

[1] https://lore.kernel.org/kvm/ZBnLaidtZEM20jMp@xxxxxxxxxx/