On Wed, 20 Oct 2021 at 01:34, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Oct 19, 2021, Paolo Bonzini wrote: > > On 19/10/21 10:12, Wanpeng Li wrote: > > > - if (kvm_vcpu_wake_up(vcpu)) > > > - return; > > > + me = get_cpu(); > > > + > > > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu)) > > > + goto out; > > > > This is racy. You are basically doing the same check that rcuwait_wake_up > > does, but without the memory barrier before. > > I was worried that was the case[*], but I didn't have the two hours it would have > taken me to verify there was indeed a problem :-) > > The intent of the extra check was to avoid the locked instruction that comes with > disabling preemption via rcu_read_lock(). But thinking more, the extra op should > be little more than a basic arithmetic operation in the grand scheme on modern x86 > since the cache line is going to be locked and written no matter what, either > immediately before or immediately after. I observe the main overhead of rcuwait_wake_up() is from rcu operations, especially rcu_read_lock/unlock(). > > So with Paolo's other comment, maybe just this? And if this doesn't provide the > desired performance boost, changes to the rcuwait behavior should go in separate > patch. Ok. Wanpeng