Quoting Tvrtko Ursulin (2019-01-25 13:54:05) > > On 25/01/2019 02:29, Chris Wilson wrote: > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 2171df2d3019..c09a6644a2ab 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence) > > > > static bool i915_fence_enable_signaling(struct dma_fence *fence) > > { > > - return intel_engine_enable_signaling(to_request(fence), true); > > + return i915_request_enable_breadcrumb(to_request(fence)); > > enable_signaling would be better IMO, enable_breadcrumb sounds like it > could do with breadcrumb emission. And it would align with > i915_fence_enable_signaling. But isn't it to entirely do with our breadcrumbs... I'm going the other way, while the request/interrupt logic is called breadcrumbs, we want to link the dma-fence signaling back to breadcrumbs. I liked the new name :) > > /** > > * i915_request_started - check if the request has begun being executed > > * @rq: the request > > @@ -345,7 +369,23 @@ static inline bool i915_request_started(const struct i915_request *rq) > > return true; > > > > /* Remember: started but may have since been preempted! */ > > - return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1); > > + return __i915_request_has_started(rq); > > +} > > Is it worth having both i915_request_started and > __i915_request_has_started. AFAIU fence cannot be signaled unless the > seqno has advanced, so if __i915_request_has_started would be dropped > and the caller just use the former, I don't think there would be any > issues. Callers always check for completed before hand. I guess it saves > a redundant fence signaled check. I was trying to consolidate down on the fence->flags logic, and decided half the checks were redundant. I was also dipping my toe into the water to see how everyone reacted to a possible s/i915_request_completed/i915_request_has_completed/ s/i915_request_started/i915_request_has_started/ i915_request_is_running > > +static inline bool i915_request_is_running(const struct i915_request *rq) > > +{ > > + if (!i915_request_is_active(rq)) > > + return false; > > + > > + return __i915_request_has_started(rq); > > return i915_request_is_active(rq) && __i915_request_has_started(rq); > probably doesn't fit in 80chars to be that attractive? It keeps the pattern with the other status checks (started, completed, is_running) though! > > -static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b) > > +static void irq_enable(struct intel_engine_cs *engine) > > +{ > > + if (!engine->irq_enable) > > + return; > > + > > + /* Caller disables interrupts */ > > GEM_BUG_ON(!irqs_disabled); ? Nah, I thought I had removed it before, but apparently there was more than one check here. It doesn't matter if we set the IMR while interrupts are disabled, we get a hang report. We can't turn off interrupts without parking, and we can't park without interrupts. That's probably the best point to say "oi!" > > + /* > > + * We may race with direct invocation of > > + * dma_fence_signal(), e.g. i915_request_retire(), > > + * in which case we can skip processing it ourselves. > > + */ > > + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > > + &rq->fence.flags)) > > + continue; > > Doesn't look this is worth it. We could then race a line below and still > end up with the already signaled rq on the local list. So would be > tempted to drop this for simplicity. It's virtually free since we already have the exclusive cacheline for rq->fence.flags. ~o~ (Except when it races and then we skip the atomic and list manipulation) There's some benefit in having little reminders that many paths lead to dma_fence_signal. > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 3f9f0a5a0f44..d7c31ad1d0ea 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -486,8 +486,8 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) > > > > for (i = 0; i < GEN7_XCS_WA; i++) { > > *cs++ = MI_STORE_DWORD_INDEX; > > - *cs++ = I915_GEM_HWS_INDEX_ADDR; > > - *cs++ = rq->global_seqno; > > + *cs++ = I915_GEM_HWS_SEQNO_ADDR; > > + *cs++ = rq->fence.seqno; > > Why this, and in this patch? A slip of the fingers. Spotted that last night as well and forgot to move the chunk. > > + for (n = 0; n < count; n++) { > > + struct i915_gem_context *ctx = > > + t->contexts[order[n] % t->ncontexts]; > > + struct i915_request *rq; > > + > > + mutex_lock(BKL); > > + > > + rq = t->request_alloc(ctx, t->engine); > > + if (IS_ERR(rq)) { > > + mutex_unlock(BKL); > > + err = PTR_ERR(rq); > > + count = n; > > + break; > > + } > > + > > + err = i915_sw_fence_await_sw_fence_gfp(&rq->submit, > > + submit, > > + GFP_KERNEL); > > Submit fence is just to ensure request doesn't finish before you can > enable signaling on it, via the wait fence? Yes. We put everything into a background queue, complete our setup, then let the scheduler do its thing. > > + > > + requests[n] = i915_request_get(rq); > > + i915_request_add(rq); > > + > > + mutex_unlock(BKL); > > + > > + if (err >= 0) > > + err = i915_sw_fence_await_dma_fence(wait, > > + &rq->fence, > > + 0, > > + GFP_KERNEL); > > If above is true why this can't simply be i915_request_enable_breadcrumbs? I was trying to test breadcrumbs at a high level without 'cheating'. For i915_request_enable_breadcrumbs, I keep thinking the test should be asking if the engine then has it on its signaling list, but that feels like a circular argument, directly following the implementation itself > > + if (err < 0) { > > else if? Now where would I be able to copy'n'paste that from? ;) > > + i915_request_put(rq); > > + count = n; > > + break; > > + } > > + } > > + > > + i915_sw_fence_commit(submit); > > + i915_sw_fence_commit(wait); > > + > > + if (!wait_event_timeout(wait->wait, > > + i915_sw_fence_done(wait), > > + HZ / 2)) { > > + struct i915_request *rq = requests[count - 1]; > > + > > + pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n", > > + count, > > + rq->fence.context, rq->fence.seqno, > > + t->engine->name); > > + i915_gem_set_wedged(t->engine->i915); > > + GEM_BUG_ON(!i915_request_completed(rq)); > > + i915_sw_fence_wait(wait); > > + err = -EIO; > > + } > > + > > + for (n = 0; n < count; n++) { > > + struct i915_request *rq = requests[n]; > > + > > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > > + &rq->fence.flags)) { > > + pr_err("%llu:%llu was not signaled!\n", > > + rq->fence.context, rq->fence.seqno); > > + err = -EINVAL; > > + } > > + > > + i915_request_put(rq); > > + } > > + > > + heap_fence_put(wait); > > + heap_fence_put(submit); > > + > > + if (err < 0) > > + break; > > + > > + num_fences += count; > > Isn't num_fences actually also growing by one, while num_requests would > be growing by count? Each request is the rq->fence. We're inspecting requests/fences and trusting i915_sw_fence. > > + num_waits++; > > + > > + cond_resched(); > > There is wait_event_timeout in the loop so this shouldn't be needed. > AFAICS is something fails loop bails out. Force of habit. Rather be overkill than have somebody moan about RCU/khungtaskd stalls. > > + for (n = 0; n < ncpus; n++) { > > + threads[n] = kthread_run(__igt_breadcrumbs_smoketest, > > + &t, "igt/%d", n); > > + if (IS_ERR(threads[n])) { > > + ret = PTR_ERR(threads[n]); > > + ncpus = n; > > I would be tempted to just fail the test rather than carrying on. > Especially since it seems selftest will anyway report failure since ret > is preserved. We have to tidy up, that's why we carry on, keeping the error state. > > + break; > > + } > > + > > + get_task_struct(threads[n]); > > + } > > Thread creation looks like can be outside struct mutex. Could be. > > +static int > > +max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine) > > +{ > > + struct i915_request *rq; > > + int ret; > > + > > + /* > > + * Before execlists, all contexts share the same ringbuffer. With > > + * execlists, each context/engine has a separate ringbuffer and > > + * for the purposes of this test, inexhaustible. > > + * > > + * For the global ringbuffer though, we have to be very careful > > + * that we do not wrap while preventing the execution of requests > > + * with a unsignaled fence. > > + */ > > + if (HAS_EXECLISTS(ctx->i915)) > > + return INT_MAX; > > + > > + rq = i915_request_alloc(engine, ctx); > > + if (IS_ERR(rq)) { > > + ret = PTR_ERR(rq); > > + } else { > > + int sz; > > + > > + ret = rq->ring->size - rq->reserved_space; > > + i915_request_add(rq); > > + > > + sz = rq->ring->emit - rq->head; > > + if (sz < 0) > > + sz += rq->ring->size; > > + ret /= sz; > > + ret /= 2; /* leave half spare, in case of emergency! */ > > + > > + /* One ring interleaved between requests from all cpus */ > > + ret /= num_online_cpus() + 1; > > It would probably be more robust for any potential future users to leave > out the num_cpus division to the caller. But then I would only have INT_MAX/ncpus for execlists!!! :) > > +static int live_breadcrumbs_smoketest(void *arg) > > +{ > > + struct drm_i915_private *i915 = arg; > > + struct smoketest t[I915_NUM_ENGINES]; > > + unsigned int ncpus = num_online_cpus(); > > + unsigned long num_waits, num_fences; > > + struct intel_engine_cs *engine; > > + struct task_struct **threads; > > + struct igt_live_test live; > > + enum intel_engine_id id; > > + intel_wakeref_t wakeref; > > + struct drm_file *file; > > + unsigned int n; > > + int ret = 0; > > + > > + /* > > + * Smoketest our breadcrumb/signal handling for requests across multiple > > + * threads. A very simple test to only catch the most egregious of bugs. > > + * See __igt_breadcrumbs_smoketest(); > > + * > > + * On real hardware this time. > > + */ > > + > > + wakeref = intel_runtime_pm_get(i915); > > + > > + file = mock_file(i915); > > + if (IS_ERR(file)) { > > + ret = PTR_ERR(file); > > + goto out_rpm; > > + } > > + > > + threads = kcalloc(ncpus * I915_NUM_ENGINES, > > + sizeof(*threads), > > + GFP_KERNEL); > > + if (!threads) > > + return -ENOMEM; > > goto out_mock_file; to avoid two leaks. > > > + > > + memset(&t[0], 0, sizeof(t[0])); > > + t[0].request_alloc = __live_request_alloc; > > + t[0].ncontexts = 64; > > + t[0].contexts = kmalloc_array(t[0].ncontexts, > > + sizeof(*t[0].contexts), > > + GFP_KERNEL); > > + if (!t[0].contexts) { > > + ret = -ENOMEM; > > + goto out_threads; > > + } > > + > > + mutex_lock(&i915->drm.struct_mutex); > > + for (n = 0; n < t[0].ncontexts; n++) { > > + t[0].contexts[n] = live_context(i915, file); > > + if (!t[0].contexts[n]) { > > + ret = -ENOMEM; > > + goto out_contexts; > > Context list unwind is missing. The magic of live_context() is that are added to the file and released when the file is freed if not earlier. > > + ret = igt_live_test_begin(&live, i915, __func__, ""); > > + if (ret) > > + goto out_contexts; > > + > > + for_each_engine(engine, i915, id) { > > + t[id] = t[0]; > > + t[id].engine = engine; > > + t[id].max_batch = max_batches(t[0].contexts[0], engine); > > + if (t[id].max_batch < 0) { > > + ret = t[id].max_batch; > > + goto out_flush; > > + } > > + pr_debug("Limiting batches to %d requests on %s\n", > > + t[id].max_batch, engine->name); > > + > > + for (n = 0; n < ncpus; n++) { > > + struct task_struct *tsk; > > + > > + tsk = kthread_run(__igt_breadcrumbs_smoketest, > > + &t[id], "igt/%d.%d", id, n); > > + if (IS_ERR(tsk)) { > > + ret = PTR_ERR(tsk); > > + goto out_flush; > > + } > > + > > + get_task_struct(tsk); > > + threads[id * ncpus + n] = tsk; > > + } > > + } > > + mutex_unlock(&i915->drm.struct_mutex); > > I think thread setup could be done outside struct mutext in two-pass, if > only max_batches calculation is done first under it. Then do it as one pass, since having a thread start a few microseconds earlier is immaterial in the grand scheme of things. > > + msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies)); > > + > > +out_flush: > > + num_waits = 0; > > + num_fences = 0; > > + for_each_engine(engine, i915, id) { > > + for (n = 0; n < ncpus; n++) { > > + struct task_struct *tsk = threads[id * ncpus + n]; > > + int err; > > + > > + if (!tsk) > > + continue; > > + > > + err = kthread_stop(tsk); > > + if (err < 0 && !ret) > > + ret = err; > > + > > + put_task_struct(tsk); > > + } > > + > > + num_waits += atomic_long_read(&t[id].num_waits); > > + num_fences += atomic_long_read(&t[id].num_fences); > > + } > > + pr_info("Completed %lu waits for %lu fence across %d engines and %d cpus\n", > > fences plural But it might only be 1! ;) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx