On 02/05/2015 01:56 PM, Paul E. McKenney wrote: > On Thu, Feb 05, 2015 at 01:09:19PM -0500, Rik van Riel wrote: >> On 02/05/2015 12:50 PM, Paul E. McKenney wrote: >>> On Thu, Feb 05, 2015 at 11:52:37AM -0500, Rik van Riel wrote: >>>> On 02/05/2015 11:44 AM, Christian Borntraeger wrote: >>>>> Am 05.02.2015 um 17:35 schrieb riel@xxxxxxxxxx: >>>>>> From: Rik van Riel <riel@xxxxxxxxxx> >>>>>> >>>>>> The host kernel is not doing anything while the CPU is executing >>>>>> a KVM guest VCPU, so it can be marked as being in an extended >>>>>> quiescent state, identical to that used when running user space >>>>>> code. >>>>>> >>>>>> The only exception to that rule is when the host handles an >>>>>> interrupt, which is already handled by the irq code, which >>>>>> calls rcu_irq_enter and rcu_irq_exit. >>>>>> >>>>>> The guest_enter and guest_exit functions already switch vtime >>>>>> accounting independent of context tracking, so leave those calls >>>>>> where they are, instead of moving them into the context tracking >>>>>> code. >>>>>> >>>>>> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx> >>>>>> --- >>>>>> include/linux/context_tracking.h | 8 +++++++- >>>>>> include/linux/context_tracking_state.h | 1 + >>>>>> 2 files changed, 8 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h >>>>>> index bd9f000fc98d..a5d3bb44b897 100644 >>>>>> --- a/include/linux/context_tracking.h >>>>>> +++ b/include/linux/context_tracking.h >>>>>> @@ -43,7 +43,7 @@ static inline enum ctx_state exception_enter(void) >>>>>> static inline void exception_exit(enum ctx_state prev_ctx) >>>>>> { >>>>>> if (context_tracking_is_enabled()) { >>>>>> - if (prev_ctx == IN_USER) >>>>>> + if (prev_ctx == IN_USER || prev_ctx == IN_GUEST) >>>>>> context_tracking_user_enter(prev_ctx); >>>>>> } >>>>>> } >>>>>> @@ -78,6 +78,9 @@ static inline void guest_enter(void) >>>>>> vtime_guest_enter(current); >>>>>> else >>>>>> current->flags |= PF_VCPU; >>>>>> + >>>>>> + if (context_tracking_is_enabled()) >>>>>> + context_tracking_user_enter(IN_GUEST); >>>>>> } >>>>> >>>>> >>>>> Couldnt we make >>>>> rcu_virt_note_context_switch(smp_processor_id()); >>>>> conditional in include/linux/kvm_host.h (kvm_guest_enter) >>>>> >>>>> e.g. something like >>>>> if (!context_tracking_is_enabled()) >>>>> rcu_virt_note_context_switch(smp_processor_id()); >>>> >>>> Possibly. I considered the same, but I do not know whether >>>> or not just rcu_user_enter / rcu_user_exit is enough. >>>> >>>> I could certainly try it out and see whether anything >>>> explodes, but I am not convinced that is careful enough >>>> when it comes to handling RCU code... >>>> >>>> Paul? :) >>> >>> That can fail for some odd combinations of Kconfig and boot parameters. >>> As I understand it, you instead want the following: >>> >>> if (!tick_nohz_full_cpu(smp_processor_id())) >>> rcu_virt_note_context_switch(smp_processor_id()); >>> >>> Frederic might know of a better approach. >> >> Interesting, in what cases would that happen? > > My concern, perhaps misplaced, is that context_tracking_is_enabled(), > but that the current CPU is not a nohz_full= CPU. In that case, the > context-tracking code would be within its rights to not tell RCU about > the transition to the guest, and thus RCU would be unaware that the > CPU was in a quiescent state. > > In most workloads, you would expect the CPU to get interrupted or > preempted or something at some point, which would eventually inform > RCU, but too much delay would result in the infamous RCU CPU stall > warning message. So let's code a bit more defensively. > >> If context_tracking_is_enabled() we end up eventually >> calling into rcu_user_enter & rcu_user_exit. >> >> If it is not enabled, we call rcu_virt_note_context_switch. >> >> In what cases would we need to call both? >> >> I don't see exit-to-userspace do the things that >> rcu_virt_note_context_switch does, and do not understand >> why userspace is fine with that... > > The real danger is doing neither. > > On tick_nohz_full_cpu() CPUs, the exit-to-userspace code should invoke > rcu_user_enter(), which sets some per-CPU state telling RCU to ignore > that CPU, since it cannot possibly do host RCU read-side critical sections > while running a guest. Ahhh, I understand now. Thank you for your explanation, Paul. I will make sure your suggestion is in version 2 of this series. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html