Hi Nitin, On Thu, Dec 05, 2024 at 04:03:38PM +0000, Gote, Nitin R wrote: > > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote: > > > Issue is seen again where engine resets fails because the engine > > > resumes from an incorrect RING_HEAD. So, increase a time if at first > > > the write doesn't succeed and retry again. > > > > > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get > > > sticks") > > > > Is this a real fix or is it more of a fine tuning? > > Here we can say this for more fine tuning as issue seen again and > that's why I have added fixes : 6ef0e3ef2662. I thinkt he Fixes: tag is not necessary, here. You are not really fixing a bug. > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 > > > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx> > > > > ... > > > > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) > > > set_pp_dir(engine); > > > > > > /* First wake the ring up to an empty/idle ring */ > > > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > > > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); > > > > Where is this 50 coming from? > > Well, here HEAD is still not 0 even after writing in it. > So, it could be the timing issue. I discussed this with Chris and we thought > It is better to add 50ms instead of 2ms delay here to let HEAD write complete. > I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and > I see BAT is successful and shards issues are not related. Can you please add a comment explaining why you are using 50ms? You can also mention that you experimented with different values and determined that 50ms works best based on testing. Andi