Re: [PATCH v3 2/2] arm64: add micro test

Andrew Jones <drjones@xxxxxxxxxx> · Wed, 24 Jan 2018 09:30:28 +0100

On Wed, Jan 24, 2018 at 09:17:55AM +0100, Christoffer Dall wrote:
> Hi Shih-Wei,
> 
> On Fri, Jan 19, 2018 at 04:57:55PM -0500, Shih-Wei Li wrote:
> > Thanks for the feedback about the mistakes in math and some issues in
> > naming, print msg, and coding style. I'll be careful and try to avoid
> > the same problems the next patch set. Sorry for all of the confusion.
> > 
> > So we now skip the test when "sample == 0" happens over 1000 times.
> > This is only due to the case that "cost is < 1/cntfrq" since it's not
> > possible for the tick to overflow for that many times. Did I miss
> > something here? I do agree that we should output better msgs to tell
> > users that the cost of a certain test is constantly smaller than a
> > tick.
> > 
> 
> I think for things like vmexit counts, it's very likely that all the
> samples will result in 0 ticks on many systems (fast CPU and slow arch
> counter; the architecture doesn't give us any guarantees here).  For
> example, a system with a 2 GHz CPU and a 1 MHz counter will give you a
> granularity of 2000 cycles for each counter tick, which is not that
> useful for low-level tuning of KVM.
> 
> So what I thought we were going to do was:
> 
> main_test_function()
> {
> 	long ntimes = NTIMES;
> 	long cost, total_cost;
> 
> 	cnt1 = read_cnt();
> 	do {
> 		run_test();
> 	} while(ntimes--);
> 	cnt2 = read_cnt();
> 
> 	if (verify_sane_counter(cnt1, cnt2))
> 		return;
> 
> 	total_cost = to_nanoseconds(cnt2 - cnt1);
> 	cost = total_cost / NTIMES;
> 	printf("testname: %l (%l)\n", cost, total_cost);
> }
> 
> And in that way amortize the potential lack of precision over all the
> iterations.  Did I miss some prior discussion about why that was a bad
> idea?

Not that I know of, but I missed the above proposal, which I completely
agree with.

> 
> It would also be possible to have two functions, one that does the above
> and one that does a per-run measurement, in case the user wants to know
> min/max/stddev and is running on a system with sufficient precision.
> The method could be chosen via an argument.

We should be able to just make NTIMES variable, setting it to one when a
per-run measurement is desired, right? Anyway, I'm fine with having set
to a reasonable value, not variable / no argument, for initial merge.

Thanks,
drew