Re: kvm: use-after-free in process_srcu

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Tue, 17 Jan 2017 12:13:30 +0100

On Tue, Jan 17, 2017 at 12:08 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>
> On 17/01/2017 10:56, Dmitry Vyukov wrote:
>>> I am seeing use-after-frees in process_srcu as struct srcu_struct is
>>> already freed. Before freeing struct srcu_struct, code does
>>> cleanup_srcu_struct(&kvm->irq_srcu). We also tried to do:
>>>
>>> +      srcu_barrier(&kvm->irq_srcu);
>>>          cleanup_srcu_struct(&kvm->irq_srcu);
>>>
>>> It reduced rate of use-after-frees, but did not eliminate them
>>> completely. The full threaded is here:
>>> https://groups.google.com/forum/#!msg/syzkaller/i48YZ8mwePY/0PQ8GkQTBwAJ
>>>
>>> Does Paolo's fix above make sense to you? Namely adding
>>> flush_delayed_work(&sp->work) to cleanup_srcu_struct()?
>>
>> I am not sure about interaction of flush_delayed_work and
>> srcu_reschedule... flush_delayed_work probably assumes that no work is
>> queued concurrently, but what if srcu_reschedule queues another work
>> concurrently... can't it happen that flush_delayed_work will miss that
>> newly scheduled work?
>
> Newly scheduled callbacks would be a bug in SRCU usage, but my patch is

I mean not srcu callbacks, but the sp->work being rescheduled.
Consider that callbacks are already scheduled. We call
flush_delayed_work, it waits for completion of process_srcu. But that
process_srcu schedules sp->work again in srcu_reschedule.

> indeed insufficient.  Because of SRCU's two-phase algorithm, it's possible
> that the first flush_delayed_work doesn't invoke all callbacks.  Instead I
> would propose this (still untested, but this time with a commit message):
>
> ---------------- 8< --------------
> From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Subject: [PATCH] srcu: wait for all callbacks before deeming SRCU "cleaned up"
>
> Even though there are no concurrent readers, it is possible that the
> work item is queued for delayed processing when cleanup_srcu_struct is
> called.  The work item needs to be flushed before returning, or a
> use-after-free can ensue.
>
> Furthermore, because of SRCU's two-phase algorithm it may take up to
> two executions of srcu_advance_batches before all callbacks are invoked.
> This can happen if the first flush_delayed_work happens as follows
>
>                                                           srcu_read_lock
>     process_srcu
>         srcu_advance_batches
>             ...
>             if (!try_check_zero(sp, idx^1, trycount))
>                 // there is a reader
>                 return;
>         srcu_invoke_callbacks
>             ...
>                                                           srcu_read_unlock
>                                                           cleanup_srcu_struct
>                                                               flush_delayed_work
>         srcu_reschedule
>             queue_delayed_work
>
> Now flush_delayed_work returns but srcu_reschedule will *not* have cleared
> sp->running to false.
>
> Not-tested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>
> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> index 9b9cdd549caa..9470f1ba2ef2 100644
> --- a/kernel/rcu/srcu.c
> +++ b/kernel/rcu/srcu.c
> @@ -283,6 +283,14 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
>  {
>         if (WARN_ON(srcu_readers_active(sp)))
>                 return; /* Leakage unless caller handles error. */
> +
> +       /*
> +        * No readers active, so any pending callbacks will rush through the two
> +        * batches before sp->running becomes false.  No risk of busy-waiting.
> +        */
> +       while (sp->running)
> +               flush_delayed_work(&sp->work);

Unsynchronized accesses to shared state make me nervous. running is
meant to be protected with sp->queue_lock.
At least we will get back to you with a KTSAN report.

>         free_percpu(sp->per_cpu_ref);
>         sp->per_cpu_ref = NULL;
>  }
>
>
> Thanks,
>
> Paolo
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@xxxxxxxxxxxxxxxx.
> For more options, visit https://groups.google.com/d/optout.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html