Re: [PATCH i-g-t 5/7] tests/perf_pmu: Tests for i915 PMU API

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 26 Sep 2017 12:42:36 +0100

Quoting Tvrtko Ursulin (2017-09-26 12:19:35)
> 
> On 25/09/2017 17:21, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-09-25 16:15:00)
> >> +#define assert_within_epsilon(x, ref, tolerance) \
> >> +       igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
> >> +                    (double)(x) >= (1.0 - tolerance) * (double)ref, \
> >> +                    "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
> >> +                    #x, #ref, (double)x, tolerance * 100.0, (double)ref)
> >> +
> >> +/*
> >> + * Helper for cases where we assert on time spent sleeping (directly or
> >> + * indirectly), so make it more robust by ensuring the system sleep time
> >> + * is within test tolerance to start with.
> >> + */
> > 
> > Looking at the callers, I don't see where we need a guaranteed sleep vs
> > a known sleep. We are comparing elapsed/wall time with a perf counter, so
> > we only care that they match. By asserting we just cause random fails if
> > the system decides not to wake up for a 1s (outside the purvey of the
> > test).
> 
> Okay so you suggest measured_usleep instead of guaranteed_usleep. And 
> then the measured time used in the asserts. Makes sense, just please 
> confirm that there hasn't been a misunderstanding.

Yes, measure the sleep then we don't have any ambiguity over the length.
If it ends up being too long for the test to behave, we will know.

> >> +       assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
> >> +       for (i = 0; i < num_engines; i++) {
> > 
> > double expected;
> > 
> > if (i == busy_idx)
> >       expected = batch_duration_ns;
> > else
> >       expected = 0;
> > assert_within_epsilon(val[i], expected, tolerance);
> 
> I don't see any advantage but can change it sure.

I just found it a little easier to parse.

> > How about batches executing from a different context, or submitted from
> > a different fd?
> 
> I can't see that adding much value? Am I thinking too much white box? 
> The perf API has nothing to do with the drm fd so I just see it not 
> relevant.

Exactly, why I think we should establish that in the test. In part, this
is because we have the other i915 perf API that is context/fd/etc bound.

The challenge is not to go over board, but atm we always use the gem_fd
that was opened before the perf, so I worry if we may allow some
inadvertent coupling there. Whereas we want to assert that this is a
system profiler, capable of capturing everything (and unfortunately not
yet capable of much filtering :(

> >> +       } else {
> >> +               usleep(batch_duration_ns / 1000);
> >> +       }
> >> +
> >> +       pmu_read_multi(fd, 2, val);
> >> +
> >> +       assert_within_epsilon(val[0], 0.0f, tolerance);
> >> +       assert_within_epsilon(val[1], 0.0f, tolerance);
> >> +
> >> +       if (busy)
> >> +               igt_spin_batch_free(gem_fd, spin);
> >> +       close(fd);
> > 
> > I'm still at a loss as to why this test is interesting. You want to say
> > that given an idle engine with a single batch there is no time spent
> > waiting for an MI event, nor waiting on a semaphore. Where is that
> > guaranteed? What happens if we do need to do such a pre-batch + sync in
> > the future. If you wanted to say that whilst the batch is executing that
> > no time is spent processing requests the batch isn't making, then maybe.
> > But without a test to demonstrate the opposite, we still have no idea if
> > the counters work.
> 
> Yeah, I am still at a stage of punting off the semaphore work to after 
> the rest is polished. Need to nose around some tests and see if I can 
> lift some code involving semaphores.

Bug 54226, it's a classic :) Not a test, but the overlay showed the
breakage quite clearly.

Hmm, we should be able to do WAIT on any platform, since it's just MI
instructions - but the setup may differ between platform. I would use
wait-for-event, rather than scanline.

Hmm, I wonder if MI_SEMAPHORE_WAIT | POLL asserts the RING_CTL
semaphore bit. I hope it does, then not only do we have a simple test,
but the counter is useful for VkEvent.

> >> +static void
> >> +test_interrupts(int gem_fd)
> >> +{
> >> +       uint64_t idle, busy, prev;
> >> +       int fd, fence = -1;
> >> +       const unsigned int count = 1000;
> >> +       unsigned int i;
> >> +
> >> +       fd = open_pmu(I915_PMU_INTERRUPTS);
> >> +
> >> +       gem_quiescent_gpu(gem_fd);
> >> +
> >> +       /* Wait for idle state. */
> >> +       prev = pmu_read_single(fd);
> >> +       idle = prev + 1;
> >> +       while (idle != prev) {
> >> +               usleep(batch_duration_ns / 1000);
> >> +               prev = idle;
> >> +               idle = pmu_read_single(fd);
> >> +       }
> >> +
> >> +       igt_assert_eq(idle - prev, 0);
> >> +
> >> +       /* Send some no-op batches with chained fences to ensure interrupts. */
> > 
> > Not all kernels would emit interrupts for this. Remember we used to spin
> > on the in-fence to avoid the interrupt setup overhead. And there's no
> > reason to assume we wouldn't again if there was a demonstrable
> > advantage.
> > 
> > You've assumed execlists elsewhere, so why not generate context-switch
> > interrupts instead? I presume you did try MI_USER_INTERRUPT from a user
> > batch? iirc it should still work.
> 
> Even if I emit it myself who guarantees i915 is listening to them? So I 
> thought it is easier to have i915 emit them for me, and by using fences 
> ensuring it is listening.

Hmm. True.
> 
> Could move towards longer batches to defeat any future busyspinning we 
> might add?

They wanted 500us! Next, they'll probably ask for 500ms!

I wonder if we could inject an interrupt from debugfs. Ok, it doesn't
have the nicety that we know we are reporting hw interrupts, but it will
confirm that we are reporting interrupts.

~o~

(Injecting interrupts is something I want to do for selftests, and I
think I read that it is possible...)

> >> +igt_main
> >> +{
> >> +       const unsigned int num_other_metrics =
> >> +                               I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
> >> +       unsigned int num_engines = 0;
> >> +       int fd = -1;
> >> +       const struct intel_execution_engine2 *e;
> >> +       unsigned int i;
> >> +
> >> +       igt_fixture {
> >> +               fd = drm_open_driver_master(DRIVER_INTEL);
> > 
> > master? For what? I didn't see any display interactions, or secure
> > dispatch. Does this mean that we only have perf counters for the master?
> 
> To ensure no one else is fiddling with the GPU while the test is running 
> since we want complete control here.

But that's not master, that's exclusive. Huge difference in semantics
:-p

> > Shouldn't we also be checking that we can capture rendernodes as well as
> > authed fd?
> 
> Again, due drm fd and perf having no direct connection I don't see this 
> as interesting.

Until we establish that there is no connection, I feel it is a vital
property of the profiler that it is able to work with any client.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx