Re: [PATCH v5 10/16] drm/vkms: Re-introduce line-per-line composition algorithm

Pekka Paalanen <pekka.paalanen@xxxxxxxxxxxxx> · Mon, 25 Mar 2024 16:56:31 +0200

On Mon, 25 Mar 2024 11:15:13 -0300
Maíra Canal <mcanal@xxxxxxxxxx> wrote:

> On 3/13/24 14:45, Louis Chauvet wrote:
> > Re-introduce a line-by-line composition algorithm for each pixel format.
> > This allows more performance by not requiring an indirection per pixel
> > read. This patch is focused on readability of the code.
> > 
> > Line-by-line composition was introduced by [1] but rewritten back to
> > pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> > on performance, and it was merged.
> > 
> > This patch is almost a revert of [2], but in addition efforts have been
> > made to increase readability and maintainability of the rotation handling.
> > The blend function is now divided in two parts:
> > - Transformation of coordinates from the output referential to the source
> > referential
> > - Line conversion and blending
> > 
> > Most of the complexity of the rotation management is avoided by using
> > drm_rect_* helpers. The remaining complexity is around the clipping, to
> > avoid reading/writing outside source/destination buffers.
> > 
> > The pixel conversion is now done line-by-line, so the read_pixel_t was
> > replaced with read_pixel_line_t callback. This way the indirection is only
> > required once per line and per plane, instead of once per pixel and per
> > plane.
> > 
> > The read_line_t callbacks are very similar for most pixel format, but it
> > is required to avoid performance impact. Some helpers for color
> > conversion were introduced to avoid code repetition:
> > - *_to_argb_u16: perform colors conversion. They should be inlined by the
> >    compiler, and they are used to avoid repetition between multiple variants
> >    of the same format (argb/xrgb and maybe in the future for formats like
> >    bgr formats).
> > 
> > This new algorithm was tested with:
> > - kms_plane (for color conversions)
> > - kms_rotation_crc (for rotations of planes)
> > - kms_cursor_crc (for translations of planes)
> > - kms_rotation (for all rotations and formats combinations) [3]
> > The performance gain was mesured with:
> > - kms_fb_stress  
> 
> Could you tell us what was the performance gain?
> 
> > 
> > [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
> >       new formats")
> >       https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@xxxxxxxxx/
> > [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
> >       functionality")
> >       https://lore.kernel.org/all/20230418130525.128733-2-mcanal@xxxxxxxxxx/
> > [3]:
> > 
> > Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx>
> > ---
> >   drivers/gpu/drm/vkms/vkms_composer.c | 167 +++++++++++++++++++------
> >   drivers/gpu/drm/vkms/vkms_drv.h      |  27 ++--
> >   drivers/gpu/drm/vkms/vkms_formats.c  | 236 ++++++++++++++++++++++-------------
> >   drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
> >   drivers/gpu/drm/vkms/vkms_plane.c    |   5 +-
> >   5 files changed, 292 insertions(+), 145 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index 989bcf59f375..5d78c33dbf41 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c

...

> > @@ -215,34 +188,146 @@ static void blend(struct vkms_writeback_job *wb,
> >   {
> >   	struct vkms_plane_state **plane = crtc_state->active_planes;
> >   	u32 n_active_planes = crtc_state->num_active_planes;
> > -	int y_pos, x_dst, x_limit;
> >   
> >   	const struct pixel_argb_u16 background_color = { .a = 0xffff };
> >   
> > -	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > +	int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > +	int crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;  
> 
> Shouldn't it be `unsigned int`?

No. It's not good to mix signed and unsigned variables in computations.
I for sure would not remember all the implicit promotion rules that
apply, and you'd probably be forced to add explicit signedness casts to
get the correct behaviour. It causes much less surprises to "normalize"
all variables to the same signedness before computing with them. Some
values in this function can be negative.

Thanks,
pq
Attachment:
pgpBV2RrGUL2c.pgp

Description: OpenPGP digital signature