On Thu, Nov 11, 2021 at 10:17 AM Ben Gardon <bgardon@xxxxxxxxxx> wrote: > > On Wed, Nov 10, 2021 at 4:13 PM David Matlack <dmatlack@xxxxxxxxxx> wrote: > > > > Thread creation requires taking the mmap_sem in write mode, which causes > > vCPU threads running in guest mode to block while they are populating > > memory. Fix this by waiting for all vCPU threads to be created and start > > running before entering guest mode on any one vCPU thread. > > > > This substantially improves the "Populate memory time" when using 1GiB > > pages since it allows all vCPUs to zero pages in parallel rather than > > blocking because a writer is waiting (which is waiting for another vCPU > > that is busy zeroing a 1GiB page). > > > > Before: > > > > $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb > > ... > > Populate memory time: 52.811184013s > > > > After: > > > > $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb > > ... > > Populate memory time: 10.204573342s > > > > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx> > > --- > > .../selftests/kvm/lib/perf_test_util.c | 26 +++++++++++++++++++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c > > index d646477ed16a..722df3a28791 100644 > > --- a/tools/testing/selftests/kvm/lib/perf_test_util.c > > +++ b/tools/testing/selftests/kvm/lib/perf_test_util.c > > @@ -22,6 +22,9 @@ struct vcpu_thread { > > > > /* The pthread backing the vCPU. */ > > pthread_t thread; > > + > > + /* Set to true once the vCPU thread is up and running. */ > > + bool running; > > }; > > > > /* The vCPU threads involved in this test. */ > > @@ -30,6 +33,9 @@ static struct vcpu_thread vcpu_threads[KVM_MAX_VCPUS]; > > /* The function run by each vCPU thread, as provided by the test. */ > > static void (*vcpu_thread_fn)(struct perf_test_vcpu_args *); > > > > +/* Set to true once all vCPU threads are up and running. */ > > +static bool all_vcpu_threads_running; > > + > > /* > > * Continuously write to the first 8 bytes of each page in the > > * specified region. > > @@ -196,6 +202,17 @@ static void *vcpu_thread_main(void *data) > > { > > struct vcpu_thread *vcpu = data; > > > > + WRITE_ONCE(vcpu->running, true); > > + > > + /* > > + * Wait for all vCPU threads to be up and running before calling the test- > > + * provided vCPU thread function. This prevents thread creation (which > > + * requires taking the mmap_sem in write mode) from interfering with the > > + * guest faulting in its memory. > > + */ > > + while (!READ_ONCE(all_vcpu_threads_running)) > > + ; > > + > > I can never remember the rules on this so I could be wrong, but you > may want a cpu_relax() in that loop to prevent it from being optimized > out. Maybe the READ_ONCE is sufficient though. READ_ONCE is sufficient to prevent the loop from being optimized out but cpu_relax() is nice to have to play nice with our hyperthread buddy. On that note there are a lot of spin waits in the KVM selftests and none of the ones I've seen use cpu_relax(). I'll take a look at adding cpu_relax() throughout the selftests in v2. > > > vcpu_thread_fn(&perf_test_args.vcpu_args[vcpu->vcpu_id]); > > > > return NULL; > > @@ -206,14 +223,23 @@ void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vc > > int vcpu_id; > > > > vcpu_thread_fn = vcpu_fn; > > + WRITE_ONCE(all_vcpu_threads_running, false); > > > > for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { > > struct vcpu_thread *vcpu = &vcpu_threads[vcpu_id]; > > > > vcpu->vcpu_id = vcpu_id; > > + WRITE_ONCE(vcpu->running, false); > > Do these need to be WRITE_ONCE? I don't think WRITE_ONCE provides any > extra memory ordering guarantees and I don't know why the compiler > would optimize these out. If they do need to be WRITE_ONCE, they > probably merit comments. To be completely honest I'm not sure. I included WRITE_ONCE out of caution to ensure the compiler does not reorder the writes with respect to the READ_ONCE. I'll need to do a bit more research to confirm if it's really necessary. > > > > > pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu); > > } > > + > > + for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) { > > + while (!READ_ONCE(vcpu_threads[vcpu_id].running)) > > + ; > > + } > > + > > + WRITE_ONCE(all_vcpu_threads_running, true); > > } > > > > void perf_test_join_vcpu_threads(int vcpus) > > -- > > 2.34.0.rc1.387.gb447b232ab-goog > >