Re: [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode

Ben Gardon <bgardon@xxxxxxxxxx> · Thu, 11 Nov 2021 10:17:28 -0800

On Wed, Nov 10, 2021 at 4:13 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
>
> Thread creation requires taking the mmap_sem in write mode, which causes
> vCPU threads running in guest mode to block while they are populating
> memory. Fix this by waiting for all vCPU threads to be created and start
> running before entering guest mode on any one vCPU thread.
>
> This substantially improves the "Populate memory time" when using 1GiB
> pages since it allows all vCPUs to zero pages in parallel rather than
> blocking because a writer is waiting (which is waiting for another vCPU
> that is busy zeroing a 1GiB page).
>
> Before:
>
>   $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
>   ...
>   Populate memory time: 52.811184013s
>
> After:
>
>   $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
>   ...
>   Populate memory time: 10.204573342s
>
> Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
> ---
>  .../selftests/kvm/lib/perf_test_util.c        | 26 +++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c
> index d646477ed16a..722df3a28791 100644
> --- a/tools/testing/selftests/kvm/lib/perf_test_util.c
> +++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
> @@ -22,6 +22,9 @@ struct vcpu_thread {
>
>         /* The pthread backing the vCPU. */
>         pthread_t thread;
> +
> +       /* Set to true once the vCPU thread is up and running. */
> +       bool running;
>  };
>
>  /* The vCPU threads involved in this test. */
> @@ -30,6 +33,9 @@ static struct vcpu_thread vcpu_threads[KVM_MAX_VCPUS];
>  /* The function run by each vCPU thread, as provided by the test. */
>  static void (*vcpu_thread_fn)(struct perf_test_vcpu_args *);
>
> +/* Set to true once all vCPU threads are up and running. */
> +static bool all_vcpu_threads_running;
> +
>  /*
>   * Continuously write to the first 8 bytes of each page in the
>   * specified region.
> @@ -196,6 +202,17 @@ static void *vcpu_thread_main(void *data)
>  {
>         struct vcpu_thread *vcpu = data;
>
> +       WRITE_ONCE(vcpu->running, true);
> +
> +       /*
> +        * Wait for all vCPU threads to be up and running before calling the test-
> +        * provided vCPU thread function. This prevents thread creation (which
> +        * requires taking the mmap_sem in write mode) from interfering with the
> +        * guest faulting in its memory.
> +        */
> +       while (!READ_ONCE(all_vcpu_threads_running))
> +               ;
> +

I can never remember the rules on this so I could be wrong, but you
may want a cpu_relax() in that loop to prevent it from being optimized
out. Maybe the READ_ONCE is sufficient though.

>         vcpu_thread_fn(&perf_test_args.vcpu_args[vcpu->vcpu_id]);
>
>         return NULL;
> @@ -206,14 +223,23 @@ void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vc
>         int vcpu_id;
>
>         vcpu_thread_fn = vcpu_fn;
> +       WRITE_ONCE(all_vcpu_threads_running, false);
>
>         for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
>                 struct vcpu_thread *vcpu = &vcpu_threads[vcpu_id];
>
>                 vcpu->vcpu_id = vcpu_id;
> +               WRITE_ONCE(vcpu->running, false);

Do these need to be WRITE_ONCE? I don't think WRITE_ONCE provides any
extra memory ordering guarantees and I don't know why the compiler
would optimize these out. If they do need to be WRITE_ONCE, they
probably merit comments.

>
>                 pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu);
>         }
> +
> +       for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
> +               while (!READ_ONCE(vcpu_threads[vcpu_id].running))
> +                       ;
> +       }
> +
> +       WRITE_ONCE(all_vcpu_threads_running, true);
>  }
>
>  void perf_test_join_vcpu_threads(int vcpus)
> --
> 2.34.0.rc1.387.gb447b232ab-goog
>