Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 4 Nov 2022 20:27:28 +0000

On Fri, Nov 04, 2022, Sean Christopherson wrote:
> On Thu, Nov 03, 2022, Gavin Shan wrote:
> > On 11/3/22 8:46 AM, Sean Christopherson wrote:
> > > On Wed, Nov 02, 2022, Robert Hoo wrote:
> > > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> > > >   			 * across the seq_cnt reads.
> > > >   			 */
> > > >   			smp_rmb();
> > > > -			sys_getcpu(&cpu);
> > > > +			vdso_getcpu(&cpu, NULL, NULL);
> > > >   			rseq_cpu = rseq_current_cpu_raw();
> > > >   			smp_rmb();
> > > >   		} while (snapshot != atomic_read(&seq_cnt));
> > > 
> > > Something seems off here.  Half of the iterations in the migration thread have a
> > > delay of 5+us, which should be more than enough time to complete a few getcpu()
> > > syscalls to stabilize the CPU.
> > > 
> > > Has anyone tried to figure out why the vCPU thread is apparently running slow?
> > > E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
> > > etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
> > > curious as to why that's a problem in the first place.
> > > 
> > > Anyways, assuming there's no underlying problem that can be solved, the easier
> > > solution is to just bump the delay in the migration thread.  As per its gigantic
> > > comment, the original bug reproduced with up to 500us delays, so bumping the min
> > > delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
> > > quota, then something else is definitely going on.
> > > 
> > 
> > I doubt if it's still caused by busy system as mentioned previously [1]. At least,
> > I failed to reproduce the issue on my ARM64 system until some workloads are enforced
> > to hog CPUs.
> 
> Yeah, I suspect something else as well.  My best guest at this point is mitigations,
> I'll test that tomorrow to see if it makes any difference.

So much for the mitigations theory, the migration thread gets slowed down more than
the vCPU thread.