Re: [PATCH v6 07/17] drm/vkms: Update pixels accessor to support packed and multi-plane formats.

Pekka Paalanen <pekka.paalanen@xxxxxxxxxxxxx> · Mon, 13 May 2024 14:14:49 +0300

On Mon, 13 May 2024 09:15:13 +0200
Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote:

> Le 22/04/24 - 14:07, Pekka Paalanen a écrit :
> > On Tue, 09 Apr 2024 15:25:25 +0200
> > Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote:
> >   
> > > Introduce the usage of block_h/block_w to compute the offset and the
> > > pointer of a pixel. The previous implementation was specialized for
> > > planes with block_h == block_w == 1. To avoid confusion and allow easier
> > > implementation of tiled formats. It also remove the usage of the
> > > deprecated format field `cpp`.
> > > 
> > > Introduce the plane_index parameter to get an offset/pointer on a
> > > different plane.
> > > 
> > > Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx>
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_formats.c | 110 ++++++++++++++++++++++++++++--------
> > >  1 file changed, 87 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > index 69cf9733fec5..9a1400ad4db6 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -10,22 +10,43 @@
> > >  #include "vkms_formats.h"
> > >  
> > >  /**
> > > - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > >   *
> > >   * @frame_info: Buffer metadata
> > >   * @x: The x coordinate of the wanted pixel in the buffer
> > >   * @y: The y coordinate of the wanted pixel in the buffer
> > > + * @plane_index: The index of the plane to use
> > > + * @offset: The returned offset inside the buffer of the block
> > > + * @rem_x,@rem_y: The returned coordinate of the requested pixel in the block
> > >   *
> > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > - * where block_h == block_w == 1.
> > > - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> > > - * outside of the buffer.
> > > + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> > > + * pixels are not individually addressable. This function return 3 values: the offset of the
> > > + * whole block, and the coordinate of the requested pixel inside this block.
> > > + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> > > + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> > > + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> > > + *
> > > + * With this function, the caller just have to extract the correct pixel from the block.
> > >   */
> > > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > > +				 int plane_index, int *offset, int *rem_x, int *rem_y)
> > >  {
> > >  	struct drm_framebuffer *fb = frame_info->fb;
> > > +	const struct drm_format_info *format = frame_info->fb->format;
> > > +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > > +	 * in some formats a block can represent multiple pixels.
> > > +	 *
> > > +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> > > +	 * containing the pixel.
> > > +	 */
> > >  
> > > -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> > > +	int block_x = x / drm_format_info_block_width(format, plane_index);
> > > +	int block_y = y / drm_format_info_block_height(format, plane_index);
> > > +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> > > +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> > > +	*offset = fb->offsets[plane_index] +
> > > +		  block_y * fb->pitches[plane_index] +
> > > +		  block_x * format->char_per_block[plane_index];  
> > 
> > I started thinking... is
> > 
> > +		  block_y * fb->pitches[plane_index] +
> > 
> > correct, or should it be
> > 
> > +		  y * fb->pitches[plane_index] +
> > 
> > ?  
> 
> The documentation is not very clear about that:
> 
>        	 * @pitches: Line stride per buffer. For userspace created object this
>        	 * is copied from drm_mode_fb_cmd2.
> 
> If I look at the drm_mode_fb_cmd2, there is this documentation:
> 
>        	/** @pitches: Pitch (aka. stride) in bytes, one per plane. */
> 
> For me, I interpret "stride" as it is used in matrix calculation, where it
> means "the number of bytes between two number adjacent verticaly"
> (&matrix[x,y] + stride == &matrix[x,y+1]).
> 
> So in a graphic context, I interpret a stride as the number of byte to
> reach the next line of blocks (as pixels can not always be accessed
> individually).
> 
> So, for me, buffer_size_in_byte >= stride * number_of_lines.

This is the definition, yes. Even for blocky formats, it is still
number of 1-pixel-high lines, even though one cannot address a line as
such. For blocky formats it is a theoretical value, and computing with
it only makes sense when your number of lines is a multiple of block
height.

That's my recollection. This has been hashed in issues like
https://gitlab.freedesktop.org/wayland/weston/-/issues/896

> > I'm looking at drm_format_info_min_pitch() which sounds like it should
> > be the latter? Because of
> >
> >         return DIV_ROUND_UP_ULL((u64)buffer_width * info->char_per_block[plane],
> >                             drm_format_info_block_width(info, plane) *
> >                             drm_format_info_block_height(info, plane));
> >
> > in drm_format_info_min_pitch().  
> 
> This function doesn't make sense to me. I can't understand how it could
> work.
> 
> If I consider the X0L2 format, with block_h == block_w == 2,
> char_per_block = 8, and a framebuffer of 1 * 10 pixels, the result of
> drm_format_info_min_pitch is 2.

If buffer_width is 1, then buffer_width / block_w is 1 because of
rounding up. Cannot have images with partial blocks. That is a
block-row of one block. That block takes char_per_block bytes, that is,
8 bytes here. But block height is 2, and stride is computed for
1-pixel-high line, so we divide by block_h, and the result is stride =
4 bytes.

However, 1 * 8 / (2 * 2) = 2. I think the code is bugged, and the
round-up happens at a wrong point. The correct form would be

div_round_up(div_round_up(buffer_width, block_w) * char_per_block, block_h)

in my opinion. There must always be an integer number of blocks on a
block-row. If a block-row has multiple blocks, doing a
div_round_up(char_per_block, block_h) would over-estimate the per-block
bytes and add the extra for each block rather than averaging it out.
So, multiplying by char_per_block before dividing by block_h gives a
stricter minimum stride.

The condition buffer_size_bytes >= buffer_height * stride is necessary
but not always sufficient, because the buffer must hold an integer
number of block-rows.

> However, for this format and this framebuffer, I think the stride should
> be at least 8 bytes (the buffer is "1 block width").

I believe the stride is 4 bytes, because stride for a 1-pixel-high line
on average, rounded up.

Stride allows for unused blocks per block-row, but its use for buffer
size checking with buffer_height is incomplete, I believe. For the
question "is buffer_size enough?" it may produce false-positives, but
it cannot produce false-negatives.

Thanks,
pq

> If pitch equals 2 (as suggested by the test), it implies that
> height * pitch is not valid for calculating the minimum size of the buffer
> (in our case, 10 * 2 = 20 bytes, but the minimum framebuffer size should
> be 5 blocks * 8 bytes_per_block = 40 bytes). And I don't understand what
> the 2 represents in this context.
> Is it the number of pixels on a line (which is strange, because pitch 
> should be in byte)? The width in byte of the first line, but by using the 
> "byte per pixel" value (which make no sense when using block formats)?
> 
> If pitch equals 8 (as I would expect), height * pitch is not optimal (but
> at least sufficient to contain the entire framebuffer), and height * pitch
> / block_h is optimal (which is logical, because height/block_h is the 
> number of block per columns).
> 
> > Btw. maybe this should check that the result is not negative (e.g. due
> > to overflow)? Or does that even work since signed overflow is undefined
> > behavior (UB) and compilers may assume UB does not happen, causing the
> > check to be eliminated as dead code?
> >
> > Otherwise this patch looks ok to me.
> > 
> > 
> > Thanks,
> > pq
> >   
> 
> [...]
> 
> --
> Louis Chauvet, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

Attachment:
pgprI9dvytCWZ.pgp

Description: OpenPGP digital signature