Hi Andi, > -----Original Message----- > From: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> > Sent: Thursday, December 5, 2024 6:35 PM > To: Gote, Nitin R <nitin.r.gote@xxxxxxxxx> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Wilson, Chris P <chris.p.wilson@xxxxxxxxx> > Subject: Re: [PATCH v1 1/1] drm/i915/gt: Increase a time to retry RING_HEAD > reset > > Hi Nitin, > > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote: > > Issue is seen again where engine resets fails because the engine > > resumes from an incorrect RING_HEAD. So, increase a time if at first > > the write doesn't succeed and retry again. > > > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get > > sticks") > > Is this a real fix or is it more of a fine tuning? Here we can say this for more fine tuning as issue seen again and that's why I have added fixes : 6ef0e3ef2662. > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806 > > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx> > > ... > > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine) > > set_pp_dir(engine); > > > > /* First wake the ring up to an empty/idle ring */ > > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC); > > Where is this 50 coming from? Well, here HEAD is still not 0 even after writing in it. So, it could be the timing issue. I discussed this with Chris and we thought It is better to add 50ms instead of 2ms delay here to let HEAD write complete. I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and I see BAT is successful and shards issues are not related. > > Thanks, > Andi > > > ktime_before(ktime_get(), (kt)); cpu_relax()) { > > /* > > * In case of resets fails because engine resumes from > > -- > > 2.25.1