Re: [patch 2/2] x86/kvm: Sanitize kvm_async_pf_task_wait()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mar 6, 2020, at 4:12 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> While working on the entry consolidation I stumbled over the KVM async page
> fault handler and kvm_async_pf_task_wait() in particular. It took me a
> while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
> invocations are just cargo cult programming. Several patches "fixed" RCU
> splats by curing the symptoms without noticing that the code is flawed
> from a design perspective.
>
> The main problem is that this async injection is not based on a proper
> handshake mechanism and only respects the minimal requirement, i.e. the
> guest is not in a state where it has interrupts disabled.
>
> Aside of that the actual code is a convoluted one fits it all swiss army
> knife. It is invoked from different places with different RCU constraints:
>
> 1) Host side:
>
>   vcpu_enter_guest()
>     kvm_x86_ops->handle_exit()
>       kvm_handle_page_fault()
>         kvm_async_pf_task_wait()
>
>   The invocation happens from fully preemptible context.

I’m a bit baffled as to why the host uses this code at all instead of
just sleeping the old fashioned way when the guest takes a (host) page
fault.  Oh well.

> +
> +/*
> + * kvm_async_pf_task_wait_schedule - Wait for pagefault to be handled
> + * @token:    Token to identify the sleep node entry
> + *
> + * Invoked from the async pagefault handling code or from the VM exit page
> + * fault handler. In both cases RCU is watching.
> + */
> +void kvm_async_pf_task_wait_schedule(u32 token)
> +{
> +    struct kvm_task_sleep_node n;
> +    DECLARE_SWAITQUEUE(wait);
> +
> +    lockdep_assert_irqs_disabled();
> +
> +    if (!kvm_async_pf_queue_task(token, false, &n))
>      return;
> +
> +    for (;;) {
> +        prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
> +        if (hlist_unhashed(&n.link))
> +            break;
> +
> +        local_irq_enable();
> +        schedule();
> +        local_irq_disable();
>  }
> +    finish_swait(&n.wq, &wait);
> +}
> +EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait_schedule);
>
> -    n.token = token;
> -    n.cpu = smp_processor_id();
> -    n.halted = is_idle_task(current) ||
> -           (IS_ENABLED(CONFIG_PREEMPT_COUNT)
> -            ? preempt_count() > 1 || rcu_preempt_depth()
> -            : interrupt_kernel);
> -    init_swait_queue_head(&n.wq);
> -    hlist_add_head(&n.link, &b->list);
> -    raw_spin_unlock(&b->lock);
> +/*
> + * Invoked from the async page fault handler.
> + */
> +static void kvm_async_pf_task_wait_halt(u32 token)
> +{
> +    struct kvm_task_sleep_node n;
> +
> +    if (!kvm_async_pf_queue_task(token, true, &n))
> +        return;
>
>  for (;;) {
> -        if (!n.halted)
> -            prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
>      if (hlist_unhashed(&n.link))
>          break;
> +        /*
> +         * No point in doing anything about RCU here. Any RCU read
> +         * side critical section or RCU watching section can be
> +         * interrupted by VMEXITs and the host is free to keep the
> +         * vCPU scheduled out as long as it sees fit. This is not
> +         * any different just because of the halt induced voluntary
> +         * VMEXIT.
> +         *
> +         * Also the async page fault could have interrupted any RCU
> +         * watching context, so invoking rcu_irq_exit()/enter()
> +         * around this is not gaining anything.
> +         */
> +        native_safe_halt();
> +        local_irq_disable();

What’s the local_irq_disable() here for? I would believe a
lockdep_assert_irqs_disabled() somewhere in here would make sense.
(Yes, I see you copied this from the old code. It’s still nonsense.)

I also find it truly bizarre that hlt actually works in this context.
Does KVM in fact wake a HLTed guest that HLTed with IRQs off when a
pending async pf is satisfied?  This would make sense if the wake
event were an interrupt, but it’s not according to Paolo.

All this being said, the only remotely sane case is when regs->flags
has IF==1. Perhaps this code should actually do:

WARN_ON(!(regs->flags & X86_EFLAGS_IF));

while (the page isn’t ready) {
 local_irq_enable();
 native_safe_halt();
 local_irq_disable();
}

with some provision to survive the case where the warning fires so we
at least get logs.

In any event, I just sent a patch to disable async page faults that
happen in kernel mode.  Perhaps this patch should actually just have
some warnings to sanity check the async page faults and not even try
to handle all these nasty sub-cases.

Paolo, is there any way to test this async page faults?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux