On 17/01/2017 21:34, Paul E. McKenney wrote: > Do any of your callback functions invoke call_srcu()? (Hey, I have to ask!) No, we only use synchronize_srcu and synchronize_srcu_expedited, so our only callback comes from there. >>>> From: Paolo Bonzini <pbonzini@xxxxxxxxxx> >>>> Subject: [PATCH] srcu: wait for all callbacks before deeming SRCU "cleaned up" >>>> >>>> Even though there are no concurrent readers, it is possible that the >>>> work item is queued for delayed processing when cleanup_srcu_struct is >>>> called. The work item needs to be flushed before returning, or a >>>> use-after-free can ensue. >>>> >>>> Furthermore, because of SRCU's two-phase algorithm it may take up to >>>> two executions of srcu_advance_batches before all callbacks are invoked. >>>> This can happen if the first flush_delayed_work happens as follows >>>> >>>> srcu_read_lock >>>> process_srcu >>>> srcu_advance_batches >>>> ... >>>> if (!try_check_zero(sp, idx^1, trycount)) >>>> // there is a reader >>>> return; >>>> srcu_invoke_callbacks >>>> ... >>>> srcu_read_unlock >>>> cleanup_srcu_struct >>>> flush_delayed_work >>>> srcu_reschedule >>>> queue_delayed_work >>>> >>>> Now flush_delayed_work returns but srcu_reschedule will *not* have cleared >>>> sp->running to false. > > But srcu_reschedule() sets sp->running to false if there are no callbacks. > And at that point, there had better be no callbacks. There must be no callbacks in batch_queue and in batch_check0, and of course srcu_invoke_callbacks will have emptied batch_done as well. However, I'm not sure that batch_check1 is always empty after the first flush_delayed_work *if there's no srcu_barrier* in the caller. srcu_advance_batches's second call to try_check_zero could have failed, and then srcu_reschedule will requeue the work item to advance batch_check1 into batch_done. If this is incorrect, then one flush_delayed_work is enough. If it is correct, the possible alternatives are: * srcu_barrier in the caller, flush_delayed_work+WARN_ON(sp->running) in cleanup_srcu_struct. I strongly dislike this one---because we don't use call_srcu at all, there should be no reason to use srcu_barrier in KVM code. Plus I think all other users have the same issue. * srcu_barrier+flush_delayed_work+WARN_ON(sp->running) in cleanup_srcu_struct * flush_delayed_work+flush_delayed_work+WARN_ON(sp->running) in cleanup_srcu_struct * while(flush_delayed_work) in cleanup_srcu_struct * "while(sp->running) flush_delayed_work" in cleanup_srcu_struct Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html