Quoting Tvrtko Ursulin (2018-06-05 09:51:01) > > On 04/06/2018 16:11, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-06-04 16:01:26) > >> > >> On 04/06/2018 14:03, Chris Wilson wrote: > >>> Quoting Chris Wilson (2018-06-04 13:59:12) > >>>> Quoting Tvrtko Ursulin (2018-06-04 13:52:39) > >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>> > >>>>> As Chris has discovered on his Ivybridge, and later automated test runs > >>>>> have confirmed, on most of our platforms hrtimer faced with heavy GPU load > >>>>> ca occasionally become sufficiently imprecise to affect PMU sampling > >>>> > >>>> s/ca/can/ > >>>> > >>>>> calculations. > >>>>> > >>>>> This means we cannot assume sampling frequency is what we asked for, but > >>>>> we need to measure the interval ourselves. > >>>>> > >>>>> This patch is similar to Chris' original proposal for per-engine counters, > >>>>> but instead of introducing a new set to work around the problem with > >>>>> frequency sampling, it swaps around the way internal frequency accounting > >>>>> is done. Instead of accumulating current frequency and dividing by > >>>>> sampling frequency on readout, it accumulates frequency scaled by each > >>>>> period. > >>>> > >>>> My ulterior motive failed to win favour ;) > >>>> > >>>>> Testcase: igt/perf_pmu/*busy* # snb, ivb, hsw > >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>> Suggested-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>> Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> > >>> I should point out that even with this fix (or rather my patch), we can > >> > >> I did not mean to steal it, > > > > Don't mind, I view such patches as "this here is a problem, and what I > > think can be done". I'm quite happy to be told "nope, the problem > > is..."; the end result is we fix and move on. > > > >> just that you seemed very uncompromising on > >> the new counters approach. If you prefer you can respin your patch with > >> this approach for frequency counters, it is fine by me. > > > > I'm just not convinced yet by the frequency sampler, and I still feel > > that we would be better off with cycles. I just haven't found a > > convincing argument ;) > > Just imagine .unit file has Mcycles instead of MHz in it? :) > > Since we went with this when adding it initially I think we should > exercise restrain with deprecating and adding so quickly. > > >>> still see errors of 25-30us, enough to fail the test. However, without > >>> the fix our errors can be orders of magnitude worse (e.g. reporting 80us > >>> busy instead of 500us). > >> > >> Yeah I guess if the timer is delayed it is delayed and all samples just > >> won't be there. I don't see a way to fix that elegantly. Even practically. > > > > Right, but it's not the case that we aren't sampling. If the timer was > > delayed entirely, then we would see 0 (start_val == end_val). I'm not > > sure exactly what goes wrong, but it probably is related to the timer > > being unreliable. (It's just that in this case we are sampling a square > > wave, so really hard to mess up!) > > Hm maybe I am not completely understanding what you are seeing. I > thought that multiple timer overruns (aka delay by multiple periods) > explains everything. > > Then even with this patch, if they happen to happen (!) at the end of a > sampling period (as observed from userspace), the large-ish chunk of > samples (depending on the total sampling duration) may be missing > (pending), and so still observed as fail. Ah, but with the variable period timer, we can force the timer to run before we report the same (perf_event_read). That should still be inside the busy batch, and it still has the same inaccuracy (~28us, for a small sample population, measuring the tail is on my wishlist): diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 34cc6f6..8578a43 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -534,6 +534,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event) val = engine->pmu.sample[sample].cur; } } else { + if (hrtimer_cancel(&i915->pmu.timer)) { + i915_sample(&i915->pmu.timer); + hrtimer_start_expires(&i915->pmu.timer, + HRTIMER_MODE_ABS); + } + switch (event->attr.config) { case I915_PMU_ACTUAL_FREQUENCY: val = I haven't thought enough about why it still has the tail, nor measured the distribution of errors. I just wanted to warn that we aren't quite out of the mess yet, but with the variable period we do at least stop drowning! > Difference after this patch is that when the timer eventually fires the > counter will account for the delay, so subsequent queries will be fine. > > If readout waited for at least one real sampling tick, that would be > solved, but by several other undesirable consequences. At least it would > slow down readout to 200Hz, but I don't know from the top of my head if > it wouldn't cause unresolvable deadlock scenarios, or if it is even > technically possible within the perf framework. (Since I don't think we > should do it, I don't even want to start thinking about it.) The above should do what you have in mind, I think. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx