Re: [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2021 at 10:17 AM Ben Gardon <bgardon@xxxxxxxxxx> wrote:
>
> On Wed, Nov 10, 2021 at 4:13 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
> >
> > Thread creation requires taking the mmap_sem in write mode, which causes
> > vCPU threads running in guest mode to block while they are populating
> > memory. Fix this by waiting for all vCPU threads to be created and start
> > running before entering guest mode on any one vCPU thread.
> >
> > This substantially improves the "Populate memory time" when using 1GiB
> > pages since it allows all vCPUs to zero pages in parallel rather than
> > blocking because a writer is waiting (which is waiting for another vCPU
> > that is busy zeroing a 1GiB page).
> >
> > Before:
> >
> >   $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
> >   ...
> >   Populate memory time: 52.811184013s
> >
> > After:
> >
> >   $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
> >   ...
> >   Populate memory time: 10.204573342s
> >
> > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
> > ---
> >  .../selftests/kvm/lib/perf_test_util.c        | 26 +++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
> > index d646477ed16a..722df3a28791 100644
> > --- a/tools/testing/selftests/kvm/lib/perf_test_util.c
> > +++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
> > @@ -22,6 +22,9 @@ struct vcpu_thread {
> >
> >         /* The pthread backing the vCPU. */
> >         pthread_t thread;
> > +
> > +       /* Set to true once the vCPU thread is up and running. */
> > +       bool running;
> >  };
> >
> >  /* The vCPU threads involved in this test. */
> > @@ -30,6 +33,9 @@ static struct vcpu_thread vcpu_threads[KVM_MAX_VCPUS];
> >  /* The function run by each vCPU thread, as provided by the test. */
> >  static void (*vcpu_thread_fn)(struct perf_test_vcpu_args *);
> >
> > +/* Set to true once all vCPU threads are up and running. */
> > +static bool all_vcpu_threads_running;
> > +
> >  /*
> >   * Continuously write to the first 8 bytes of each page in the
> >   * specified region.
> > @@ -196,6 +202,17 @@ static void *vcpu_thread_main(void *data)
> >  {
> >         struct vcpu_thread *vcpu = data;
> >
> > +       WRITE_ONCE(vcpu->running, true);
> > +
> > +       /*
> > +        * Wait for all vCPU threads to be up and running before calling the test-
> > +        * provided vCPU thread function. This prevents thread creation (which
> > +        * requires taking the mmap_sem in write mode) from interfering with the
> > +        * guest faulting in its memory.
> > +        */
> > +       while (!READ_ONCE(all_vcpu_threads_running))
> > +               ;
> > +
>
> I can never remember the rules on this so I could be wrong, but you
> may want a cpu_relax() in that loop to prevent it from being optimized
> out. Maybe the READ_ONCE is sufficient though.

READ_ONCE is sufficient to prevent the loop from being optimized out
but cpu_relax() is nice to have to play nice with our hyperthread
buddy.

On that note there are a lot of spin waits in the KVM selftests and
none of the ones I've seen use cpu_relax().

I'll take a look at adding cpu_relax() throughout the selftests in v2.

>
> >         vcpu_thread_fn(&perf_test_args.vcpu_args[vcpu->vcpu_id]);
> >
> >         return NULL;
> > @@ -206,14 +223,23 @@ void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vc
> >         int vcpu_id;
> >
> >         vcpu_thread_fn = vcpu_fn;
> > +       WRITE_ONCE(all_vcpu_threads_running, false);
> >
> >         for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
> >                 struct vcpu_thread *vcpu = &vcpu_threads[vcpu_id];
> >
> >                 vcpu->vcpu_id = vcpu_id;
> > +               WRITE_ONCE(vcpu->running, false);
>
> Do these need to be WRITE_ONCE? I don't think WRITE_ONCE provides any
> extra memory ordering guarantees and I don't know why the compiler
> would optimize these out. If they do need to be WRITE_ONCE, they
> probably merit comments.

To be completely honest I'm not sure. I included WRITE_ONCE out of
caution to ensure the compiler does not reorder the writes with
respect to the READ_ONCE. I'll need to do a bit more research to
confirm if it's really necessary.

>
> >
> >                 pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu);
> >         }
> > +
> > +       for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
> > +               while (!READ_ONCE(vcpu_threads[vcpu_id].running))
> > +                       ;
> > +       }
> > +
> > +       WRITE_ONCE(all_vcpu_threads_running, true);
> >  }
> >
> >  void perf_test_join_vcpu_threads(int vcpus)
> > --
> > 2.34.0.rc1.387.gb447b232ab-goog
> >



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux