Quoting Tvrtko Ursulin (2021-01-11 16:19:40) > > On 11/01/2021 10:57, Chris Wilson wrote: > > During igt_reset_nop_engine, it was observed that an unexpected failed > > engine reset lead to us busywaiting on the stop-ring semaphore (set > > during the reset preparations) on the first request afterwards. There was > > no explicit MI_ARB_CHECK in this sequence as the presumption was that > > the failed MI_SEMAPHORE_WAIT would itself act as an arbitration point. > > It did not in this circumstance, so force it. > > In other words MI_SEMAPHORE_POLL is not a preemption point? Can't > remember if I knew that or not.. MI_SEMAPHORE_WAIT | POLL is most definitely a preemption point on a miss. > 1) > Why not the same handling in !gen12 version? Because I think it's a bug in tgl [a0 at least]. I think I've seen the same symptoms on tgl before, but not earlier. This is the first time the sequence clicked as to why it was busy spinning. Random engine reset failures are rare enough -- I was meant to also write a test case to inject failure. > 2) > Failed reset leads to busy-hang in following request _tail_? But there > is an arb check at the start of following request as well. Or in cases > where we context switch into the middle of a previously executing request? It was the first request submitted after the failed reset. We expect to clear the ring-stop flag on the CS IDLE->ACTIVE event. > But why would that busy hang? Hasn't the failed request unpaused the ring? The engine was idle at the time of the failed reset. We left the ring-stop set, and submitted the next batch of requests. We hit the MI_SEMAPHORE_WAIT(ring-stop) at the end of the first request, but without hitting an arbitration point (first request, no init-breadcrumb in this case), the semaphore was stuck. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx