Quoting Ruhl, Michael J (2020-06-30 20:11:16) > >-----Original Message----- > >From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chris > >Wilson > >Sent: Monday, June 29, 2020 7:36 AM > >To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > >Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >Subject: [PATCH v2] drm/i915/gem: Move obj->lut_list under its > >own lock > > > >The obj->lut_list is traversed when the object is closed as the file > >table is destroyed during process termination. As this occurs before we > >kill any outstanding context if, due to some bug or another, the closure > >is blocked, then we fail to shootdown any inflight operations > >potentially leaving the GPU spinning forever. As we only need to guard > >the list against concurrent closures and insertions, the hold is short > >and merits being treated as a simple spinlock. > > The comment: > > /* Break long locks, and carefully continue on from this spot */ > > seems to be contrary to your "the hold is short" comment. Is calling out > this apparent worst case scenario (i.e. how it could happen), worth > documenting? It's paranoia, because list iterating can be slow and historically slow object closure has been reported as an issue. Only a few objects will be shared between multiple contexts, and even then you would only expect a couple of contexts to share an object. One of the OglDrvCtx would show up here, which convinced us to move to the per-object lists to shrink the number of elements walked. Even the close race igts do not cause the list to become long enough to schedule, but it would be possible to create an object that was shared by 64k contexts. Just not wise in practice. [However, I should make sure an igt does hit the bookmark.] > >@@ -104,21 +105,29 @@ void i915_gem_close_object(struct > >drm_gem_object *gem, struct drm_file *file) > > { > > struct drm_i915_gem_object *obj = to_intel_bo(gem); > > struct drm_i915_file_private *fpriv = file->driver_priv; > >+ struct i915_lut_handle bookmark = {}; > > struct i915_mmap_offset *mmo, *mn; > > struct i915_lut_handle *lut, *ln; > > LIST_HEAD(close); > > > >- i915_gem_object_lock(obj); > >+ spin_lock(&obj->lut_lock); > > list_for_each_entry_safe(lut, ln, &obj->lut_list, obj_link) { > > struct i915_gem_context *ctx = lut->ctx; > > > >- if (ctx->file_priv != fpriv) > >- continue; > >+ if (ctx && ctx->file_priv == fpriv) { > > Null checking ctx was not done before. I think this changed with the possible cond_resched? The bookmark introduces the lut->ctx == NULL to identify it as being special, hence the need to check. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx