Quoting Tvrtko Ursulin (2017-10-11 13:33:28) > > On 10/10/2017 22:38, Chris Wilson wrote: > > A bug recently encountered involved the issue where are we were > > submitting requests to different ppGTT, each would pin a segment of the > > GGTT for its logical context and ring. However, this is invisible to > > eviction as we do not tie the context/ring VMA to a request and so do > > not automatically wait upon it them (instead they are marked as pinned, > > prevent eviction entirely). Instead the eviction code must flush those > > preventing? > > > contexts by switching to the kernel context. This selftest tries to > > fill the GGTT with contexts to exercise a path where the > > switch-to-kernel-context failed to make forward progress and we fail > > with ENOSPC. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 121 +++++++++++++++++++++ > > .../gpu/drm/i915/selftests/i915_live_selftests.h | 1 + > > drivers/gpu/drm/i915/selftests/mock_context.c | 6 +- > > 3 files changed, 123 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > > index 5ea373221f49..53df8926be7f 100644 > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > > @@ -24,6 +24,8 @@ > > > > #include "../i915_selftest.h" > > > > +#include "mock_context.h" > > +#include "mock_drm.h" > > #include "mock_gem_device.h" > > > > static int populate_ggtt(struct drm_i915_private *i915) > > @@ -325,6 +327,116 @@ static int igt_evict_vm(void *arg) > > return err; > > } > > > > +static int igt_evict_contexts(void *arg) > > +{ > > + struct drm_i915_private *i915 = arg; > > + struct intel_engine_cs *engine; > > + enum intel_engine_id id; > > + struct reserved { > > + struct drm_mm_node node; > > + struct reserved *next; > > + } *reserved = NULL; > > + unsigned long count; > > + int err = 0; > > + > > + /* Make the GGTT appear small (but leave just enough to function) */ > > How does it do that? > > > + count = 0; > > + mutex_lock(&i915->drm.struct_mutex); > > + do { > > + struct reserved *r; > > + > > + r = kcalloc(1, sizeof(*r), GFP_KERNEL); > > + if (!r) { > > + err = -ENOMEM; > > + goto out_locked; > > + } > > + > > + if (i915_gem_gtt_insert(&i915->ggtt.base, &r->node, > > + 1ul << 20, 0, I915_COLOR_UNEVICTABLE, > > + 16ul << 20, i915->ggtt.base.total, > > + PIN_NOEVICT)) { > > + kfree(r); > > + break; > > + } > > + > > + r->next = reserved; > > + reserved = r; > > + > > + count++; > > + } while (1); > > + mutex_unlock(&i915->drm.struct_mutex); > > + pr_info("Filled GGTT with %lu 1MiB nodes\n", count); > > Oh right, this helps. :) > > > + > > + /* Overfill the GGTT with context objects and so try to evict one. */ > > Can GGTT be divisible by 1MiB in which case the above filling would fill > everything and make any eviction impossible? Do you need an assert that > there is a little bit of space left for below to work? There's a subtle restriction that we only try to fill above 16MiB in the GGTT. So the assumption is that leaves us enough space to fit at least one request/context (and its preliminary setup, such as golden render state). Failure to have enough GGTT space to issue that first request leads to fun. Might have to make that a little more tolerant in future. > > + for_each_engine(engine, i915, id) { > > + struct i915_sw_fence *fence; > > + struct drm_file *file; > > + unsigned long count = 0; > > No shadows variable warnings here? W=1, brave man! My bad forgot to remove as I reused it for the counter above. > > + mutex_lock(&i915->drm.struct_mutex); > > + do { > > + struct drm_i915_gem_request *rq; > > + struct i915_gem_context *ctx; > > + > > + ctx = live_context(i915, file); > > + if (!ctx) > > + break; > > + > > + rq = i915_gem_request_alloc(engine, ctx); > > + if (IS_ERR(rq)) { > > + if (PTR_ERR(rq) != -ENOMEM) { > > + pr_err("Unexpected error from request alloc (ctx hw id %u, on %s): %d\n", > > + ctx->hw_id, engine->name, > > + (int)PTR_ERR(rq)); > > + err = PTR_ERR(rq); > > + } > > + break; > > Comment on why it is OK to stop the test on ENOMEM would be good. Because the first versions ran out of memory before filling the GGTT :) Now that there is a reasonable cap on the GGTT size, we can just let it ENOMEM and fixup later if required. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx