On Fri, Apr 12, 2024, Zide Chen wrote: > Currently, the migration worker delays 1-10 us, assuming that one > KVM_RUN iteration only takes a few microseconds. But if C-state exit > latencies are large enough, for example, hundreds or even thousands > of microseconds on server CPUs, it may happen that it's not able to > bring the target CPU out of C-state before the migration worker starts > to migrate it to the next CPU. > > If the system workload is light, most CPUs could be at a certain level > of C-state, which may result in less successful migrations and fail the > migration/KVM_RUN ratio sanity check. > > This patch adds a command line option to skip the sanity check in > this case. > > Additionally, seems it's reasonable to randomize the length of usleep(), > other than delay in a fixed pattern. This belongs in a separate patch. And while it's reasonable on the surface, I doubt think it buys us anything, and it makes an already non-deterministic test even less deterministic. In other words, unless a random sleep time helps find more bugs or finds the original bug faster, just drop the randomization. > V2: > - removed the busy loop implementation > - add the new "-s" option This belongs in the ignored part of the patch... > > Signed-off-by: Zide Chen <zide.chen@xxxxxxxxx> ...down here. > --- > tools/testing/selftests/kvm/rseq_test.c | 37 +++++++++++++++++++++++-- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c > index 28f97fb52044..515cfa32a925 100644 > --- a/tools/testing/selftests/kvm/rseq_test.c > +++ b/tools/testing/selftests/kvm/rseq_test.c > @@ -150,7 +150,7 @@ static void *migration_worker(void *__rseq_tid) > * Use usleep() for simplicity and to avoid unnecessary kernel > * dependencies. > */ > - usleep((i % 10) + 1); > + usleep((rand() % 10) + 1); > } > done = true; > return NULL; > @@ -186,12 +186,35 @@ static void calc_min_max_cpu(void) > "Only one usable CPU, task migration not possible"); > } > > +static void usage(const char *name) Uber nit, "help()" is more common than "usage()". > @@ -254,9 +279,15 @@ int main(int argc, char *argv[]) > * getcpu() to stabilize. A 2:1 migration:KVM_RUN ratio is a fairly > * conservative ratio on x86-64, which can do _more_ KVM_RUNs than > * migrations given the 1us+ delay in the migration task. > + * > + * Another reason why it may have small migration:KVM_RUN ratio is that, > + * on systems with large C-state exit latency, it may happen quite often > + * that the scheduler is not able to wake up the target CPU before the > + * vCPU thread is scheduled to another CPU. > */ > - TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2), > - "Only performed %d KVM_RUNs, task stalled too much?", i); > + TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2), > + "Only performed %d KVM_RUNs, task stalled too much? " > + "Try to turn off C-states or run it with the -s option", i); I think it's worth explicitly telling the user how to reduce CPU wakeup latency. Also, are C-states called that on other architectures? E.g. maybe this to avoid confusing the user? Not a big deal, e.g. I've no objection whatsoever to the comment, but it seems easy enough to avoid confusing the user. "Try setting /dev/cpu_dma_latency to reduce CPU wakeup latency, " "or run with -s to skip this sanity check", i);