Re: [PATCH v2] drm/i915/gem: Move obj->lut_list under its own lock

"Ruhl, Michael J" <michael.j.ruhl@xxxxxxxxx> · Tue, 30 Jun 2020 20:01:51 +0000

>-----Original Message-----
>From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>Sent: Tuesday, June 30, 2020 3:29 PM
>To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx>; intel-
>gfx@xxxxxxxxxxxxxxxxxxxxx
>Subject: Re:  [PATCH v2] drm/i915/gem: Move obj->lut_list under
>its own lock
>
>Quoting Ruhl, Michael J (2020-06-30 20:11:16)
>> >-----Original Message-----
>> >From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
>Chris
>> >Wilson
>> >Sent: Monday, June 29, 2020 7:36 AM
>> >To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>> >Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> >Subject:  [PATCH v2] drm/i915/gem: Move obj->lut_list under its
>> >own lock
>> >
>> >The obj->lut_list is traversed when the object is closed as the file
>> >table is destroyed during process termination. As this occurs before we
>> >kill any outstanding context if, due to some bug or another, the closure
>> >is blocked, then we fail to shootdown any inflight operations
>> >potentially leaving the GPU spinning forever. As we only need to guard
>> >the list against concurrent closures and insertions, the hold is short
>> >and merits being treated as a simple spinlock.
>>
>> The comment:
>>
>> /* Break long locks, and carefully continue on from this spot */
>>
>> seems to be contrary to your "the hold is short" comment.  Is calling out
>> this apparent worst case scenario (i.e. how it could happen), worth
>> documenting?
>
>It's paranoia, because list iterating can be slow and historically slow
>object closure has been reported as an issue. Only a few objects will be
>shared between multiple contexts, and even then you would only expect a
>couple of contexts to share an object. One of the OglDrvCtx would show up
>here, which convinced us to move to the per-object lists to shrink the
>number of elements walked.
>
>Even the close race igts do not cause the list to become long enough to
>schedule, but it would be possible to create an object that was shared
>by 64k contexts. Just not wise in practice. [However, I should make sure
>an igt does hit the bookmark.]

I look forward to that igt. 😊

>
>> >@@ -104,21 +105,29 @@ void i915_gem_close_object(struct
>> >drm_gem_object *gem, struct drm_file *file)
>> > {
>> >       struct drm_i915_gem_object *obj = to_intel_bo(gem);
>> >       struct drm_i915_file_private *fpriv = file->driver_priv;
>> >+      struct i915_lut_handle bookmark = {};
>> >       struct i915_mmap_offset *mmo, *mn;
>> >       struct i915_lut_handle *lut, *ln;
>> >       LIST_HEAD(close);
>> >
>> >-      i915_gem_object_lock(obj);
>> >+      spin_lock(&obj->lut_lock);
>> >       list_for_each_entry_safe(lut, ln, &obj->lut_list, obj_link) {
>> >               struct i915_gem_context *ctx = lut->ctx;
>> >
>> >-              if (ctx->file_priv != fpriv)
>> >-                      continue;
>> >+              if (ctx && ctx->file_priv == fpriv) {
>>
>> Null checking ctx was not done before.  I think this changed with the
>possible cond_resched?
>
>The bookmark introduces the lut->ctx == NULL to identify it as being
>special, hence the need to check.

Ok,

clean lock replacement with a paranoid worst case scenario, just in case.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>

M
>-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx