Re: [PATCH] drm/i915: program wm blocks to at least blocks required per line

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Wed, 6 Apr 2022 21:09:06 +0300

On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > In configurations with single DRAM channel, for usecases like
> > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > memory latency calculations are probably not taking the DRAM
> > > > > channel's impact into account.
> > > > > 
> > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > one plane_blocks_per_line we should have selected method2.
> > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > per line.
> > > > > 
> > > > > cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@xxxxxxxxx>
> > > > > 
> > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@xxxxxxxxx>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > > > >  		}
> > > > >  	}
> > > > >  
> > > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > +	/*
> > > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > +	 * as there will be at minimum one line for lines configuration.
> > > > > +	 *
> > > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > +	 * select the blocks as plane_blocks_per_line.
> > > > > +	 *
> > > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > > +	 * wm calculations.
> > > > > +	 */
> > > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > > 
> > > > That's looks rather convoluted.
> > > > 
> > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > + /* blah */
> > > > + if (has_lines)
> > > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > 
> > > We probably need to do similar refactoring in the whole function ;-)
> > > 
> > > > 
> > > > Also since Art said nothing like this should actually be needed
> > > > I think the comment should make it a bit more clear that this
> > > > is just a hack to work around the underruns with some single
> > > > memory channel configurations.
> > > 
> > > It is actually not quite a hack, because we are missing that condition
> > > implementation from BSpec 49325, which instructs us to select method2
> > > when ddb blocks allocation is known and that ratio is >= 1.
> > 
> > The ddb allocation is not yet known, so we're implementing the
> > algorithm 100% correctly.
> > 
> > And this patch does not implement that misisng part anyway.
> 
> Yes, as I understood method2 would just give amount of blocks to be
> at least as dbuf blocks per line.
> 
> Wonder whether should we actually fully implement this BSpec clause 
> and add it to the point where ddb allocation is known or are there 
> any obstacles to do that, besides having to reshuffle this function a bit?

We need to calculate the wm to figure out how much ddb to allocate,
and then we'd need the ddb allocation to figure out how to calculate
the wm. Very much chicken vs. egg right there. We'd have to do some
kind of hideous loop where we'd calculate everything twice. I don't
really want to do that since I'd actually like to move the wm
calculation to happen already much earlier during .check_plane()
as that could reduce the amount of redundant wm calculations we
are currently doing.

-- 
Ville Syrjälä
Intel