Quoting Tvrtko Ursulin (2020-02-07 16:46:55) > > If you want quick&dirty feedback read below, if you want something > smarter wait some more. :) > > On 07/02/2020 11:11, Chris Wilson wrote: > > +static void kill_stale_engines(struct i915_gem_context *ctx) > > +{ > > + struct i915_gem_engines *pos, *next; > > + unsigned long flags; > > + > > + spin_lock_irqsave(&ctx->stale.lock, flags); > > + list_for_each_entry_safe(pos, next, &ctx->stale.engines, link) { > > + if (!i915_sw_fence_await(&pos->fence)) > > + continue; > > When is this path hit? Race with the interrupt callback. > > + > > + spin_unlock_irqrestore(&ctx->stale.lock, flags); > > + > > + kill_engines(pos); > > + > > + spin_lock_irqsave(&ctx->stale.lock, flags); > > + list_safe_reset_next(pos, next, link); > > + list_del_init(&pos->link); > > + > > + i915_sw_fence_complete(&pos->fence); > > This will trigger FENCE_FREE below? Yes, the final completion sends both notifications. > > +static int engines_notify(struct i915_sw_fence *fence, > > + enum i915_sw_fence_notify state) > > +{ > > + struct i915_gem_engines *engines = > > + container_of(fence, typeof(*engines), fence); > > + > > + switch (state) { > > + case FENCE_COMPLETE: > > + if (!list_empty(&engines->link)) { > > Why it is safe to look at the state of engines->link outside the lock? > We can have a race between context close and completion event on a stale > engine, right? There is no race :) It's just coordination with kill_stale_engines(). > > +static void engines_idle_release(struct i915_gem_engines *engines) > > +{ > > + struct i915_gem_engines_iter it; > > + struct intel_context *ce; > > + unsigned long flags; > > + > > + GEM_BUG_ON(!engines); > > + i915_sw_fence_init(&engines->fence, engines_notify); > > + > > + spin_lock_irqsave(&engines->ctx->stale.lock, flags); > > + list_add(&engines->link, &engines->ctx->stale.engines); > > + spin_unlock_irqrestore(&engines->ctx->stale.lock, flags); > > + > > + for_each_gem_engine(ce, engines, it) { > > + struct dma_fence *fence; > > + int err; > > + > > + if (!ce->timeline) > > + continue; > > When does this happen? Replacing the default engines before use. Or any engine set prior to use. > > + > > + fence = i915_active_fence_get(&ce->timeline->last_request); > > + if (!fence) > > + continue; > > + > > + err = i915_sw_fence_await_dma_fence(&engines->fence, > > + fence, 0, > > + GFP_KERNEL); > > + > > + dma_fence_put(fence); > > + if (err < 0) { > > + kill_engines(engines); > > + break; > > Okay to leave already setup awaits active in this case? Yes. They will be signaled. It may seem a bit harsh, but we fell into an unlikely error path and have to do so something. > > -void i915_sw_fence_await(struct i915_sw_fence *fence) > > +bool i915_sw_fence_await(struct i915_sw_fence *fence) > > { > > - debug_fence_assert(fence); > > - WARN_ON(atomic_inc_return(&fence->pending) <= 1); > > + int old, new; > > + > > + new = atomic_read(&fence->pending); > > + do { > > + if (new < 1) > > + return false; > > + > > + old = new++; > > + } while ((new = atomic_cmpxchg(&fence->pending, old, new)) != old); > > + > > + return true; > > No idea what's happening here. Why was the existing code inadequate and > what are you changing? I needed an await_if_busy to handle the race with the interrupts. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx