Re: [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

Martin Peres <martin.peres@xxxxxxxxxxxxxxx> · Wed, 4 May 2016 12:09:50 +0300

On 03/05/16 23:03, Robert Bragg wrote:

On Tue, May 3, 2016 at 8:34 PM, Robert Bragg <robert@xxxxxxxxxxxxx
<mailto:robert@xxxxxxxxxxxxx>> wrote:

    Sorry for the delay replying to this, I missed it.

    On Sat, Apr 23, 2016 at 11:34 AM, Martin Peres <martin.peres@xxxxxxx
    <mailto:martin.peres@xxxxxxx>> wrote:

        On 20/04/16 17:23, Robert Bragg wrote:

            Gen graphics hardware can be set up to periodically write
            snapshots of
            performance counters into a circular buffer via its Observation
            Architecture and this patch exposes that capability to
            userspace via the
            i915 perf interface.

            Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx
            <mailto:chris@xxxxxxxxxxxxxxxxxx>>
            Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx
            <mailto:robert@xxxxxxxxxxxxx>>
            Signed-off-by: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx
            <mailto:zhenyuw@xxxxxxxxxxxxxxx>>
            ---
              drivers/gpu/drm/i915/i915_drv.h         |  56 +-
              drivers/gpu/drm/i915/i915_gem_context.c |  24 +-
              drivers/gpu/drm/i915/i915_perf.c        | 940
            +++++++++++++++++++++++++++++++-
              drivers/gpu/drm/i915/i915_reg.h         | 338 ++++++++++++
              include/uapi/drm/i915_drm.h             |  70 ++-
              5 files changed, 1408 insertions(+), 20 deletions(-)

            +
            +
            +       /* It takes a fairly long time for a new MUX
            configuration to
            +        * be be applied after these register writes. This delay
            +        * duration was derived empirically based on the
            render_basic
            +        * config but hopefully it covers the maximum
            configuration
            +        * latency...
            +        */
            +       mdelay(100);

        With such a HW and SW design, how can we ever expose hope to get any
        kind of performance when we are trying to monitor different
        metrics on each
        draw call? This may be acceptable for system monitoring, but it
        is problematic
        for the GL extensions :s

        Since it seems like we are going for a perf API, it means that
        for every change
        of metrics, we need to flush the commands, wait for the GPU to
        be done, then
        program the new set of metrics via an IOCTL, wait 100 ms, and
        then we may
        resume rendering ... until the next change. We are talking about
        a latency of
        6-7 frames at 60 Hz here... this is non-negligeable...

        I understand that we have a ton of counters and we may hide
        latency by not
        allowing using more than half of the counters for every draw
        call or frame, but
        even then, this 100ms delay is killing this approach altogether.

So revisiting this to double check how things fail with my latest
driver/tests without the delay, I apparently can't reproduce test
failures without the delay any more...

I think the explanation is that since first adding the delay to the
driver I also made the the driver a bit more careful to not forward
spurious reports that look invalid due to a zeroed report id field, and
that mechanism keeps the unit tests happy, even though there are still
some number of invalid reports generated if we don't wait.

One problem with simply having no delay is that the driver prints an
error if it sees an invalid reports so I get a lot of 'Skipping
spurious, invalid OA report' dmesg spam. Also this was intended more as
a last resort mechanism, and I wouldn't feel too happy about squashing
the error message and potentially sweeping other error cases under the
carpet.

Experimenting to see if the delay can at least be reduced, I brought the
delay up in millisecond increments and found that although I still see a
lot of spurious reports only waiting 1 or 5 milliseconds, at 10
milliseconds its reduced quite a bit and at 15 milliseconds I don't seem
to have any errors.

15 milliseconds is still a long time, but at least not as long as 100.

OK, so the issue does not come from the HW after all, great!

Now, my main question is, why are spurious events generated when 
changing the MUX's value? I can understand that we would need to ignore 
the reading that came right after the change, but other than this,  I am 
a bit at a loss.

I am a bit swamped with other tasks right now, but I would love to spend 
more time reviewing your code as I really want to see this upstream!

Martin
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel