On Wed, May 04, 2016 at 10:49:53AM +0100, Robert Bragg wrote: > On Wed, May 4, 2016 at 10:09 AM, Martin Peres <martin.peres@xxxxxxxxxxxxxxx> > wrote: > > > On 03/05/16 23:03, Robert Bragg wrote: > > > >> > >> > >> On Tue, May 3, 2016 at 8:34 PM, Robert Bragg <robert@xxxxxxxxxxxxx > >> <mailto:robert@xxxxxxxxxxxxx>> wrote: > >> > >> Sorry for the delay replying to this, I missed it. > >> > >> On Sat, Apr 23, 2016 at 11:34 AM, Martin Peres <martin.peres@xxxxxxx > >> <mailto:martin.peres@xxxxxxx>> wrote: > >> > >> On 20/04/16 17:23, Robert Bragg wrote: > >> > >> Gen graphics hardware can be set up to periodically write > >> snapshots of > >> performance counters into a circular buffer via its > >> Observation > >> Architecture and this patch exposes that capability to > >> userspace via the > >> i915 perf interface. > >> > >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx > >> <mailto:chris@xxxxxxxxxxxxxxxxxx>> > >> Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx > >> <mailto:robert@xxxxxxxxxxxxx>> > >> Signed-off-by: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx > >> <mailto:zhenyuw@xxxxxxxxxxxxxxx>> > >> > >> --- > >> drivers/gpu/drm/i915/i915_drv.h | 56 +- > >> drivers/gpu/drm/i915/i915_gem_context.c | 24 +- > >> drivers/gpu/drm/i915/i915_perf.c | 940 > >> +++++++++++++++++++++++++++++++- > >> drivers/gpu/drm/i915/i915_reg.h | 338 ++++++++++++ > >> include/uapi/drm/i915_drm.h | 70 ++- > >> 5 files changed, 1408 insertions(+), 20 deletions(-) > >> > >> + > >> + > >> + /* It takes a fairly long time for a new MUX > >> configuration to > >> + * be be applied after these register writes. This > >> delay > >> + * duration was derived empirically based on the > >> render_basic > >> + * config but hopefully it covers the maximum > >> configuration > >> + * latency... > >> + */ > >> + mdelay(100); > >> > >> > >> With such a HW and SW design, how can we ever expose hope to get > >> any > >> kind of performance when we are trying to monitor different > >> metrics on each > >> draw call? This may be acceptable for system monitoring, but it > >> is problematic > >> for the GL extensions :s > >> > >> > >> Since it seems like we are going for a perf API, it means that > >> for every change > >> of metrics, we need to flush the commands, wait for the GPU to > >> be done, then > >> program the new set of metrics via an IOCTL, wait 100 ms, and > >> then we may > >> resume rendering ... until the next change. We are talking about > >> a latency of > >> 6-7 frames at 60 Hz here... this is non-negligeable... > >> > >> > >> I understand that we have a ton of counters and we may hide > >> latency by not > >> allowing using more than half of the counters for every draw > >> call or frame, but > >> even then, this 100ms delay is killing this approach altogether. > >> > >> > >> > >> > >> So revisiting this to double check how things fail with my latest > >> driver/tests without the delay, I apparently can't reproduce test > >> failures without the delay any more... > >> > >> I think the explanation is that since first adding the delay to the > >> driver I also made the the driver a bit more careful to not forward > >> spurious reports that look invalid due to a zeroed report id field, and > >> that mechanism keeps the unit tests happy, even though there are still > >> some number of invalid reports generated if we don't wait. > >> > >> One problem with simply having no delay is that the driver prints an > >> error if it sees an invalid reports so I get a lot of 'Skipping > >> spurious, invalid OA report' dmesg spam. Also this was intended more as > >> a last resort mechanism, and I wouldn't feel too happy about squashing > >> the error message and potentially sweeping other error cases under the > >> carpet. > >> > >> Experimenting to see if the delay can at least be reduced, I brought the > >> delay up in millisecond increments and found that although I still see a > >> lot of spurious reports only waiting 1 or 5 milliseconds, at 10 > >> milliseconds its reduced quite a bit and at 15 milliseconds I don't seem > >> to have any errors. > >> > >> 15 milliseconds is still a long time, but at least not as long as 100. > >> > > > > OK, so the issue does not come from the HW after all, great! > > > > Erm, I'm not sure that's a conclusion we can make here... > > The upshot here was really just reducing the delay from 100ms to 15ms. > Previously I arrived at a workable delay by jumping the delay in orders of > magnitude with 10ms failing, 100ms passing and I didn't try and refine it > further. Here I've looked at delays between 10 and 100ms. > > The other thing is observing that because the kernel is checking for > invalid reports (generated by the hardware) before forwarding to userspace > the lack of a delay no longer triggers i-g-t failures because the invalid > data won't reach i-g-t any more - though the invalid reports are still a > thing to avoid. > > > > Now, my main question is, why are spurious events generated when changing > > the MUX's value? I can understand that we would need to ignore the reading > > that came right after the change, but other than this, I am a bit at a > > loss. > > > > The MUX selects 16 signals that the OA unit can turn into 16 separate > counters by basically counting the signal changes. (there's some fancy > fixed function logic that can affect this but that's the general idea). > > If the MUX is in the middle of being re-programmed then some subset of > those 16 signals are for who knows what. > > After programming the MUX we will go on to configure the OA unit and the > tests will enable periodic sampling which (if we have no delay) will sample > the OA counters that are currently being fed by undefined signals. > > So as far as that goes it makes sense to me to expect bad data if we don't > wait for the MUX config to land properly. Something I don't really know is > how come we're seemingly lucky to have the reports be cleanly invalid with > a zero report-id, instead of just having junk data that would be harder to > recognise. Yeah this mdelay story sounds realy scary. Few random comments: - msleep instead of mdelay please - no dmesg noise above debug level for stuff that we know can happen - dmesg noise counts as igt failures - reading 0 sounds more like bad synchronization. Have you tried quiescing the entire gpu (to make sure nothing is happen) and disabling OA, then updating, and then restarting? At least on a very quick look I didn't spot that. Random delays freak me out a bit, but wouldn't be surprised if really needed. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx