RE: [PATCH 2/4] drm/i915/color: Stop using non-posted DSB writes for legacy LUT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Ville
> Syrjala
> Sent: Wednesday, November 20, 2024 10:11 PM
> To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: stable@xxxxxxxxxxxxxxx
> Subject: [PATCH 2/4] drm/i915/color: Stop using non-posted DSB writes for legacy
> LUT
> 
> From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> 
> DSB LUT register writes vs. palette anti-collision logic appear to interact in
> interesting ways:
> - posted DSB writes simply vanish into thin air while
>   anti-collision is active
> - non-posted DSB writes actually get blocked by the anti-collision
>   logic, but unfortunately this ends up hogging the bus for
>   long enough that unrelated parallel CPU MMIO accesses start
>   to disappear instead
> 
> Even though we are updating the LUT during vblank we aren't immune to the
> anti-collision logic because it kicks in brifly for pipe prefill (initiated at frame start).

Nit: Typo in "briefly"

> The safe time window for performing the LUT update is thus between the
> undelayed vblank and frame start. Turns out that with low enough CDCLK
> frequency (DSB execution speed depends on CDCLK) we can exceed that.
> 
> As we are currently using non-posted writes for the legacy LUT updates, in which
> case we can hit the far more severe failure mode. The problem is exacerbated by
> the fact that non-posted writes are much slower than posted writes (~4x it
> seems).
> 
> To mititage the problem let's switch to using posted DSB writes for legacy LUT
> updates (which will involve using the double write approach to avoid other
> problems with DSB vs. legacy LUT writes). Despite writing each register twice this
> will in fact make the legacy LUT update faster when compared to the non-posted
> write approach, making the problem less likely to appear. The failure mode is also
> less severe.
> 
> This isn't the 100% solution we need though. That will involve estimating how
> long the LUT update will take, and pushing frame start and/or delayed vblank
> forward to guarantee that the update will have finished by the time the pipe
> prefill starts...

Yeah to avoid issues with anti-collision, this would be the ideal case. Will try to get
hardware teams view and take on the findings as well.

However, for now the approach and changes look good to me.
Reviewed-by: Uma Shankar <uma.shankar@xxxxxxxxx>

> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates")
> Fixes: 25ea3411bd23 ("drm/i915/dsb: Use non-posted register writes for legacy
> LUT")
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494
> Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/display/intel_color.c | 30 ++++++++++++++--------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_color.c
> b/drivers/gpu/drm/i915/display/intel_color.c
> index 6ea3d5c58cb1..7cd902bbd244 100644
> --- a/drivers/gpu/drm/i915/display/intel_color.c
> +++ b/drivers/gpu/drm/i915/display/intel_color.c
> @@ -1368,19 +1368,29 @@ static void ilk_load_lut_8(const struct
> intel_crtc_state *crtc_state,
>  	lut = blob->data;
> 
>  	/*
> -	 * DSB fails to correctly load the legacy LUT
> -	 * unless we either write each entry twice,
> -	 * or use non-posted writes
> +	 * DSB fails to correctly load the legacy LUT unless
> +	 * we either write each entry twice when using posted
> +	 * writes, or we use non-posted writes.
> +	 *
> +	 * If palette anti-collision is active during LUT
> +	 * register writes:
> +	 * - posted writes simply get dropped and thus the LUT
> +	 *   contents may not be correctly updated
> +	 * - non-posted writes are blocked and thus the LUT
> +	 *   contents are always correct, but simultaneous CPU
> +	 *   MMIO access will start to fail
> +	 *
> +	 * Choose the lesser of two evils and use posted writes.
> +	 * Using posted writes is also faster, even when having
> +	 * to write each register twice.
>  	 */
> -	if (crtc_state->dsb_color_vblank)
> -		intel_dsb_nonpost_start(crtc_state->dsb_color_vblank);
> -
> -	for (i = 0; i < 256; i++)
> +	for (i = 0; i < 256; i++) {
>  		ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
>  			      i9xx_lut_8(&lut[i]));
> -
> -	if (crtc_state->dsb_color_vblank)
> -		intel_dsb_nonpost_end(crtc_state->dsb_color_vblank);
> +		if (crtc_state->dsb_color_vblank)
> +			ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
> +				      i9xx_lut_8(&lut[i]));
> +	}
>  }
> 
>  static void ilk_load_lut_10(const struct intel_crtc_state *crtc_state,
> --
> 2.45.2





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux