RE: [PATCH v1 1/1] drm/i915/gt: Increase a time to retry RING_HEAD reset

"Gote, Nitin R" <nitin.r.gote@xxxxxxxxx> · Thu, 5 Dec 2024 16:03:38 +0000

Hi Andi,

> -----Original Message-----
> From: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Sent: Thursday, December 5, 2024 6:35 PM
> To: Gote, Nitin R <nitin.r.gote@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Wilson, Chris P <chris.p.wilson@xxxxxxxxx>
> Subject: Re: [PATCH v1 1/1] drm/i915/gt: Increase a time to retry RING_HEAD
> reset
> 
> Hi Nitin,
> 
> On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote:
> > Issue is seen again where engine resets fails because the engine
> > resumes from an incorrect RING_HEAD. So, increase a time if at first
> > the write doesn't succeed and retry again.
> >
> > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get
> > sticks")
> 
> Is this a real fix or is it more of a fine tuning?

Here we can say this for more fine tuning as issue seen again and 
that's why I have added fixes : 6ef0e3ef2662.

> 
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806
> > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx>
> 
> ...
> 
> > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
> >  	set_pp_dir(engine);
> >
> >  	/* First wake the ring up to an empty/idle ring */
> > -	for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC);
> > +	for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC);
> 
> Where is this 50 coming from?

Well, here HEAD is still not 0 even after writing in it. 
So, it could be the timing issue. I discussed this with Chris and we thought
It is better to add 50ms instead of 2ms delay here to let HEAD write complete.
I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and 
I see BAT is successful and shards issues are not related.

> 
> Thanks,
> Andi
> 
> >  			ktime_before(ktime_get(), (kt)); cpu_relax()) {
> >  		/*
> >  		 * In case of resets fails because engine resumes from
> > --
> > 2.25.1