Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 3 Nov 2022 00:46:51 +0000

On Wed, Nov 02, 2022, Robert Hoo wrote:
> vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> generally available.
> Use vDSO getcpu() to reduce the overhead, so that vcpu thread stalls less
> therefore can have more odds to hit the race condition.
> 
> Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test")
> Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
> ---

...

> @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
>  			 * across the seq_cnt reads.
>  			 */
>  			smp_rmb();
> -			sys_getcpu(&cpu);
> +			vdso_getcpu(&cpu, NULL, NULL);
>  			rseq_cpu = rseq_current_cpu_raw();
>  			smp_rmb();
>  		} while (snapshot != atomic_read(&seq_cnt));

Something seems off here.  Half of the iterations in the migration thread have a
delay of 5+us, which should be more than enough time to complete a few getcpu()
syscalls to stabilize the CPU.

Has anyone tried to figure out why the vCPU thread is apparently running slow?
E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
curious as to why that's a problem in the first place.

Anyways, assuming there's no underlying problem that can be solved, the easier
solution is to just bump the delay in the migration thread.  As per its gigantic
comment, the original bug reproduced with up to 500us delays, so bumping the min
delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
quota, then something else is definitely going on.