Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 4 Nov 2022 02:07:08 +0000

On Thu, Nov 03, 2022, Robert Hoo wrote:
> On Thu, 2022-11-03 at 00:46 +0000, Sean Christopherson wrote:
> > On Wed, Nov 02, 2022, Robert Hoo wrote:
> > > vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> > > generally available.
> > > Use vDSO getcpu() to reduce the overhead, so that vcpu thread
> > > stalls less
> > > therefore can have more odds to hit the race condition.
> > > 
> > > Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of
> > > sched_getcpu() in rseq_test")
> > > Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
> > > ---
> > 
> > ...
> > 
> > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> > >  			 * across the seq_cnt reads.
> > >  			 */
> > >  			smp_rmb();
> > > -			sys_getcpu(&cpu);
> > > +			vdso_getcpu(&cpu, NULL, NULL);
> > >  			rseq_cpu = rseq_current_cpu_raw();
> > >  			smp_rmb();
> > >  		} while (snapshot != atomic_read(&seq_cnt));
> > 
> > Something seems off here.  Half of the iterations in the migration
> > thread have a
> > delay of 5+us, which should be more than enough time to complete a
> > few getcpu()
> > syscalls to stabilize the CPU.
> > 
> The migration thread delay time is for the whole vcpu thread loop, not
> just vcpu_run(), I think.

Yes, but if switching to vdso_getcpu() makes the issues go away, that suggests
that the task migration is causing the tight do-while loop to get stuck.

> for (i = 0; !done; i++) {
> 		vcpu_run(vcpu);
> 		TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
> 			    "Guest failed?");
> ...
> 		do {
> 			...
> 			vdso_getcpu(&cpu, NULL, NULL);
> 			rseq_cpu = rseq_current_cpu_raw();
> 			...
> 		} while (snapshot != atomic_read(&seq_cnt));
> 
> ...
> 	}
> 
> > Has anyone tried to figure out why the vCPU thread is apparently running
> > slow?  E.g. is KVM_RUN itself taking a long time, is the task not getting
> > scheduled in, etc...  I can see how using vDSO would make the vCPU more
> > efficient, but I'm curious as to why that's a problem in the first place.
> 
> Yes, it should be the first-place problem.
> But firstly, it's the whole for(){} loop taking more time than before,

Do you have actual performance numbers?  If so, can you share them?