Quoting Tvrtko Ursulin (2020-12-04 10:52:23) > > On 03/12/2020 09:59, Chris Wilson wrote: > > Race the execution and interrupt handlers along a context, while > > closing it at a random time. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > tests/i915/gem_ctx_exec.c | 60 +++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 60 insertions(+) > > > > diff --git a/tests/i915/gem_ctx_exec.c b/tests/i915/gem_ctx_exec.c > > index 194191def..18d5d1217 100644 > > --- a/tests/i915/gem_ctx_exec.c > > +++ b/tests/i915/gem_ctx_exec.c > > @@ -336,6 +336,63 @@ static void nohangcheck_hostile(int i915) > > close(i915); > > } > > > > +static void close_race(int i915) > > +{ > > + const int ncpus = sysconf(_SC_NPROCESSORS_ONLN); > > + uint32_t *contexts; > > + > > + contexts = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0); > > + igt_assert(contexts != MAP_FAILED); > > + > > + for (int child = 0; child < ncpus; child++) > > + contexts[child] = gem_context_clone_with_engines(i915, 0); > > + > > + igt_fork(child, ncpus) { > > + igt_spin_t *spin; > > + > > + spin = igt_spin_new(i915, .flags = IGT_SPIN_POLL_RUN); > > + igt_spin_end(spin); > > + gem_sync(i915, spin->handle); > > + > > + while (!READ_ONCE(contexts[ncpus])) { > > + int64_t timeout = 1; > > + > > + igt_spin_reset(spin); > > + igt_assert(!igt_spin_has_started(spin)); > > + > > + spin->execbuf.rsvd1 = READ_ONCE(contexts[child]); > > + if (__gem_execbuf(i915, &spin->execbuf)) > > + continue; > > + > > + igt_assert(gem_bo_busy(i915, spin->handle)); > > I've seen this line fail in CI results - any idea how that can happen? Yes. The presumption we have in this test is that if gem_execbuf succeeds, the request will execute. However, see drm/i915/gem: Propagate error from cancelled submit due to context closure https://patchwork.freedesktop.org/patch/405392/?series=84531&rev=1 we are cancelling the request if we detect the context is closed before the request is submitted, but still returned success (with an async error). I think the test makes a fair assumption, and it's easier than I was thinking to return the error for the closed context, which makes the gem_execbuf solid for this race. > > + gem_wait(i915, spin->handle, &timeout); /* prime irq */ > > Is this depending on implementation specific behaviour, that we will > leave the irq on after the waiter had exited? It's the best I can do. Nothing in the uAPI should ever govern exactly the HW details. And since this can be used to trick the machine into locking up under irq pressure... Maybe not the best. > > + igt_spin_busywait_until_started(spin); > > + > > + igt_spin_end(spin); > > + gem_sync(i915, spin->handle); > > + } > > + > > + igt_spin_free(i915, spin); > > + } > > + > > + igt_until_timeout(5) { > > + for (int child = 0; child < ncpus; child++) { > > + gem_context_destroy(i915, contexts[child]); > > + contexts[child] = > > + gem_context_clone_with_engines(i915, 0); > > Right so deliberate attempt to occasionally make the child use closed > context. Presumably, well according to the CI results, it does manage to > consistently hit it, which surprises me a bit. A comment here would be good. > > > + } > > + usleep(1000); > > Maybe add some randomness here? Or even a random busy loop within the > child loop? I haven't looked at the i915 patch yet to know where the > race actually is.. The CPU scheduler is great at providing randomness :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx