Quoting Mika Kuoppala (2018-05-30 16:02:06) > There is a problem with kbl up to rev E0 where a heavy > memory traffic from adjacent engine(s) can cause an engine > reset to fail. This traffic can be from normal memory accesses > or it can be from heavy polling on a semaphore wait. > > To combat the normal traffic, we do our best to idle the adjacent > engines, before we ask the engine to prepare for reset. For per > engine reset, this will add an unwanted extra latency as we > do blanket approach before every reset. In past already have > noticed that idling an engine before reset, improves our chances > of resetting it, but this only idles the engines we are about to > reset, not the adjancent ones. Unfortunately we don't have a lock on the other engines, so can't prevent two resets running in parallel clobbering state on the other. So what's stopping the failure mode of falling back to resetting all engines at once if resetting one fails? Is it a catastrophic failure? > We could only take the approach of idling adjacent engines, > if the first reset fails. But in this area, it is usually best > to get it right off the bat. > > For the second issue where unlimited semaphore wait poll loop > is generating the heavy memory traffic and preventing a reset, > we add one microsecond poll interval to semaphore wait to > guarantee bandwidth for the reset preration. The side effect > is that we make semaphore completion latencies also 1us longer. You know the rule: second issue, second patch. That's odd, I would have expected a MI_SEMA op to be an arbitration point (even inside the busy wait loop), so would have expected it to behave nicely with STOP_RING. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx