On Wed, Jan 08, 2025 at 08:50:43PM -0800, Matthew Brost wrote:
On Sat, Jan 04, 2025 at 01:19:59AM -0600, Lucas De Marchi wrote:
On Fri, Dec 20, 2024 at 06:42:09PM -0600, Lucas De Marchi wrote:
> On Fri, Dec 20, 2024 at 04:32:09PM -0800, Umesh Nerlige Ramappa wrote:
> > On Fri, Dec 20, 2024 at 12:32:16PM -0600, Lucas De Marchi wrote:
> > > On Thu, Dec 19, 2024 at 12:49:16PM -0800, Umesh Nerlige Ramappa wrote:
> > > > On Thu, Dec 12, 2024 at 09:34:32AM -0800, Lucas De Marchi wrote:
> > > > > This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
> > > > > queue file lock usage."). While it's desired to have the mutex to
> > > > > protect only the reference to the exec queue, getting and dropping each
> > > > > mutex and then later getting the GPU timestamp, doesn't produce a
> > > > > correct result: it introduces multiple opportunities for the task to be
> > > > > scheduled out and thus wrecking havoc the deltas reported to userspace.
> > > > >
> > > > > Also, to better correlate the timestamp from the exec queues with the
> > > > > GPU, disable preemption so they can be updated without allowing the task
> > > > > to be scheduled out. We leave interrupts enabled as that shouldn't be
> > > > > enough disturbance for the deltas to matter to userspace.
> > > >
> > > > Like I said in the past, this is not trivial to solve and I
> > > > would hate to add anything in the KMD to do so.
> > >
> > > I think the best we can do in the kernel side is to try to guarantee the
> > > correlated counters are sampled together... And that is already very
> > > good per my tests. Also, it'd not only be good from a testing
> > > perspective, but for any userspace trying to make sense of the 2
> > > counters.
> > >
> > > Note that this is not much different from how e.g. perf samples group
> > > events:
> > >
> > > The unit of scheduling in perf is not an individual event, but rather an
> > > event group, which may contain one or more events (potentially on
> > > different PMUs). The notion of an event group is useful for ensuring
> > > that a set of mathematically related events are all simultaneously
> > > measured for the same period of time. For example, the number of L1
> > > cache misses should not be larger than the number of L2 cache accesses.
> > > Otherwise, it may happen that the events get multiplexed and their
> > > measurements would no longer be comparable, making the analysis more
> > > difficult.
> > >
> > > See __perf_event_read() that will call pmu->read() on all sibling events
> > > while disabling preemption:
> > >
> > > perf_event_read()
> > > {
> > > ...
> > > preempt_disable();
> > > event_cpu = __perf_event_read_cpu(event, event_cpu);
> > > ...
> > > (void)smp_call_function_single(event_cpu, __perf_event_read, &data, 1);
> > > preempt_enable();
> > > ...
> > > }
> > >
> > > so... at least there's prior art for that... for the same reason that
> > > userspace should see the values sampled together.
> >
> > Well, I have used the preempt_disable/enable when fixing some
> > selftest (i915), but was not happy that there were still some rare
> > failures. If reducing error rates is the intention, then it's fine.
> > In my mind, the issue still exists and once in a while we would end
> > up assessing such a failure. Maybe, in addition, fixing up the IGTs
> > like you suggest below is a worthwhile option.
IMO, we should strive to avoid using low-level calls like
preempt_disable and preempt_enable, as they lead to unmaintainable
nonsense, as seen in the i915.
Sure, in Umesh's example, this is pretty clear and not an unreasonable
usage. However, I’m more concerned that this sets a precedent in Xe that
doing this is acceptable.
each such usage needs to be carefully reviewed, but there are cases
in which it's useful and we shouldn't ban it. In my early reply on
wdcrw3du2ssykmsrda3mvwjhreengeovwasikmixdiowqsfnwj@lsputsgtmha4 I even
showed how the same construct is used by perf when reading counters
that should be sampled together.
I will post a new version, but I will delay in a week or so merging it.
That's because I want to check the result of the other patch series that
changes the test in igt and afaics should give all green results all the
time: https://patchwork.freedesktop.org/series/143204/
Lucas De Marchi
Not a blocker, just expressing concerns.
Matt
>
> for me this fix is not targeted at tests, even if it improves them a
> lot. It's more for consistent userspace behavior.
>
> >
> > >
> > > >
> > > > For IGT, why not just take 4 samples for the measurement
> > > > (separate out the 2 counters)
> > > >
> > > > 1. get gt timestamp in the first sample
> > > > 2. get run ticks in the second sample
> > > > 3. get run ticks in the third sample
> > > > 4. get gt timestamp in the fourth sample
> > > >
> > > > Rely on 1 and 4 for gt timestamp delta and on 2 and 3 for
> > > > run ticks delta.
> > >
> > > this won't fix it for the general case: you get rid of the > 100% case,
> > > you make the < 100% much worse.
> >
> > yeah, that's quite possible.
> >
> > >
> > > For a testing perspective I think the non-flaky solution is to stop
> > > calculating percentages and rather check that the execution timestamp
> > > recorded by the GPU very closely matches (minus gpu scheduling delays)
> > > the one we got via fdinfo once the fence signals and we wait for the job
> > > completion.
> >
> > Agree, we should change how we validate the counters in IGT.
>
> I have a wip patch to cleanup and submit to igt. I will submit it soon.
Just submitted that as the last patch in the series:
https://lore.kernel.org/igt-dev/20250104071548.737612-8-lucas.demarchi@xxxxxxxxx/T/#u
but I'd also like to apply this one in the kernel and still looking for
a review.
thanks
Lucas De Marchi