Re: [PATCH] KVM: selftests: Run dirty_log_perf_test on specific cpus

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 17 Aug 2022 21:29:23 +0000

On Wed, Aug 17, 2022, Vipin Sharma wrote:
> On Wed, Aug 17, 2022 at 10:25 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > > +static int parse_num(const char *num_str)
> > > +{
> > > +     int num;
> > > +     char *end_ptr;
> > > +
> > > +     errno = 0;
> > > +     num = (int)strtol(num_str, &end_ptr, 10);
> > > +     TEST_ASSERT(num_str != end_ptr && *end_ptr == '\0',
> > > +                 "Invalid number string.\n");
> > > +     TEST_ASSERT(errno == 0, "Conversion error: %d\n", errno);
> >
> > Is the paranoia truly necessary?  What happens if parse_cpu_list() simply uses
> > atoi() and is passed garbage?
> 
> On error atoi() returns 0, which is also a valid logical cpu number.

Lame.

> We need error checking here to make sure that the user really wants
> cpu 0 and it was not a mistake in typing.
> I was thinking of using parse_num API for other places as well instead
> of atoi() in dirty_log_perf_test.

Yes, definitely.  And maybe give it a name like atoi_paranoid()?

> Yeah, it was either my almost duplicate functions or have the one
> function do two things via if-else.  I am not happy with both
> approaches.
> 
> I think I will pass an integer array which this parsing function will
> fill up and return an int denoting how many elements were filled. The
> caller then can use the array as they wish, to copy it in
> vcpu_to_lcpu_map or cpuset.

Eh, I doubt that'll be a net improvement, e.g. the CPUSET case will then need to
re-loop, which seems silly.  If the exclusive cpuset vs. array is undesirable, we
could have the API require at least one instead of exactly one, i.e.

	TEST_ASSERT(cpuset || vcpu_map);

	...

                cpu = atoi(cpustr);
                TEST_ASSERT(cpu >= 0, "Invalid cpu number: %d\n", cpu);
                if (vcpu_map)
                        vcpu_map[i++] = cpu;
                if (cpuset)
                        CPU_SET(cpu, cpuset);

If we somehow end up with a third type of destination, then we can revisit this,
but that seems unlikely at this point.

> > > @@ -383,6 +450,26 @@ static void help(char *name)
> > >       backing_src_help("-s");
> > >       printf(" -x: Split the memory region into this number of memslots.\n"
> > >              "     (default: 1)\n");
> > > +     printf(" -c: Comma separated values of the logical CPUs which will run\n"
> > > +            "     the vCPUs. Number of values should be equal to the number\n"
> > > +            "     of vCPUs.\n\n"
> > > +            "     Example: ./dirty_log_perf_test -v 3 -c 22,43,1\n"
> > > +            "     This means that the vcpu 0 will run on the logical cpu 22,\n"
> > > +            "     vcpu 1 on the logical cpu 43 and vcpu 2 on the logical cpu 1.\n"
> > > +            "     (default: No cpu mapping)\n\n");
> > > +     printf(" -d: Comma separated values of the logical CPUs on which\n"
> > > +            "     dirty_log_perf_test will run. Without -c option, all of\n"
> > > +            "     the vcpus and main process will run on the cpus provided here.\n"
> > > +            "     This option also accepts a single cpu. (default: No cpu mapping)\n\n"
> > > +            "     Example 1: ./dirty_log_perf_test -v 3 -c 22,43,1 -d 101\n"
> > > +            "     Main application thread will run on logical cpu 101 and\n"
> > > +            "     vcpus will run on the logical cpus 22, 43 and 1\n\n"
> > > +            "     Example 2: ./dirty_log_perf_test -v 3 -d 101\n"
> > > +            "     Main application thread and vcpus will run on the logical\n"
> > > +            "     cpu 101\n\n"
> > > +            "     Example 3: ./dirty_log_perf_test -v 3 -d 101,23,53\n"
> > > +            "     Main application thread and vcpus will run on logical cpus\n"
> > > +            "     101, 23 and 53.\n");
> > >       puts("");
> > >       exit(0);
> > >  }
> >
> > > @@ -455,6 +550,13 @@ int main(int argc, char *argv[])
> > >               }
> > >       }
> > >
> >
> > I wonder if we should make -c and -d mutually exclusive.  Tweak -c to include the
> > application thread, i.e. TEST_ASSERT(nr_lcpus == nr_vcpus+1) and require 1:1 pinning
> > for all tasks.  E.g. allowing "-c ..., -d 0,1,22" seems unnecessary.
> >
> 
> One downside I can think of will be if we add some worker threads
> which are not vcpus then all of those threads will end up running on a
> single cpu unless we edit this parsing logic again.

But adding worker threads also requires a code change, i.e. it won't require a
separate commit/churn.  And if we get to the point where we want multiple workers,
it should be relatively straightforward to support pinning an arbitrary number of
workers, e.g.

	enum memtest_worker_type {
		MAIN_WORKER,
		MINION_1,
		MINION_2,
		NR_MEMTEST_WORKERS,
	}

	TEST_ASSERT(nr_lcpus == nr_vcpus + NR_MEMTEST_WORKERS);

void spawn_worker(enum memtest_worker_type type, <function pointer>)
{
	cpu_set_t cpuset;

	CPU_ZERO(&cpuset);
	CPU_SET(task_map[nr_vcpus + type], &cpuset);

	<set affinity and spawn>
}

> Current implementation gives vcpus special treatment via -c and for
> the whole application via -d. This gives good separation of concerns
> via flags.

But they aren't separated, e.g. using -d without -c means vCPUs are thrown into
the same pool as worker threads.  And if we ever do add more workers, -d doesn't
allow the user to pin workers 1:1 with logical CPUs.

Actually, if -c is extended to pin workers, then adding -d is unnecessary.  If the
user wants to affine tasks to CPUs but not pin 1:1, it can do that with e.g. taskset.
What the user can't do is pin 1:1.

If we don't want to _require_ the caller to pin the main worker, then we could do

	TEST_ASSERT(nr_lcpus >= nr_vcpus &&
		    nr_lcpus <= nr_vcpus + NR_MEMTEST_WORKERS);

to _require_ pinning all vCPUs, and allow but not require pinning non-vCPU tasks.