Quoting Tvrtko Ursulin (2019-09-19 10:34:15) > > On 19/09/2019 02:53, Daniele Ceraolo Spurio wrote: > > Our assumption that the we can ask the HW to lock the SFC even if not > > currently in use does not match the HW commitment. The expectation from > > the HW is that SW will not try to lock the SFC if the engine is not > > using it and if we do that the behavior is undefined; on ICL the HW > > ends up to returning the ack and ignoring our lock request, but this is > > not guaranteed and we shouldn't expect it going forward. > > > > Also, failing to get the ack while the SFC is in use means that we can't > > cleanly reset it, so fail the engine reset in that scenario. > > > > v2: drop rmw change, keep the log as debug and handle failure (Chris), > > improve comments (Tvrtko). > > > > Reported-by: Owen Zhang <owen.zhang@xxxxxxxxx> > > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > static void gen11_unlock_sfc(struct intel_engine_cs *engine) > > @@ -430,12 +436,21 @@ static int gen11_reset_engines(struct intel_gt *gt, > > for_each_engine_masked(engine, gt->i915, engine_mask, tmp) { > > GEM_BUG_ON(engine->id >= ARRAY_SIZE(hw_engine_mask)); > > hw_mask |= hw_engine_mask[engine->id]; > > - hw_mask |= gen11_lock_sfc(engine); > > + ret = gen11_lock_sfc(engine, &hw_mask); > > + if (ret) > > + goto sfc_unlock; > > Break on first failure looks unsafe to me. I think it would be more > robust to continue, no? Like if we have been asked to reset multiple > engines and only one failed, why not do the ones we can? Any failure means a fallback to a full device reset. It doesn't matter if we could reset one of two+ engines at that point, it's past the point of no return. > > ret = gen6_hw_domain_reset(gt, hw_mask); > > > > +sfc_unlock: > > + /* > > + * we unlock the SFC based on the lock status and not the result of > > + * gen11_lock_sfc to make sure that we clean properly if something > > + * wrong happened during the lock (e.g. lock acquired after timeout > > + * expiration). > > + */ > > if (engine_mask != ALL_ENGINES) > > for_each_engine_masked(engine, gt->i915, engine_mask, tmp) > > gen11_unlock_sfc(engine); > > > > So you decided not to read the register and cross check? Very meh, that check didn't seem like it would improve our ability to handle resets. If you wanted to actually check, we should check the lock_ack_bit is cleared as well. (Or at least check that is clear before we start the next lock_sfc().) I think skipping an unnecessary gen11_lock_sfc() is a solid improvement by itself (I'm hopeful that's enough to make icl more robust...). Tightening up the unlock can be done separately. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx