On 27/02/24 12:02, Louis Chauvet wrote: > Hi Pekka, > > For all the comment related to the conversion part, maybe Arthur have an > opinion on it, I took his patch as a "black box" (I did not want to > break (and debug) it). > > Le 26/02/24 - 14:19, Pekka Paalanen a écrit : >> On Fri, 23 Feb 2024 12:37:26 +0100 >> Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote: >> >>> From: Arthur Grillo <arthurgrillo@xxxxxxxxxx> >>> >>> Add support to the YUV formats bellow: >>> >>> - NV12 >>> - NV16 >>> - NV24 >>> - NV21 >>> - NV61 >>> - NV42 >>> - YUV420 >>> - YUV422 >>> - YUV444 >>> - YVU420 >>> - YVU422 >>> - YVU444 >>> >>> The conversion matrices of each encoding and range were obtained by >>> rounding the values of the original conversion matrices multiplied by >>> 2^8. This is done to avoid the use of fixed point operations. >>> >>> Signed-off-by: Arthur Grillo <arthurgrillo@xxxxxxxxxx> >>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t >>> callbacks for yuv formats] >>> Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx> >>> --- >>> drivers/gpu/drm/vkms/vkms_composer.c | 2 +- >>> drivers/gpu/drm/vkms/vkms_drv.h | 6 +- >>> drivers/gpu/drm/vkms/vkms_formats.c | 289 +++++++++++++++++++++++++++++++++-- >>> drivers/gpu/drm/vkms/vkms_formats.h | 4 + >>> drivers/gpu/drm/vkms/vkms_plane.c | 14 +- >>> 5 files changed, 295 insertions(+), 20 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c >>> index e555bf9c1aee..54fc5161d565 100644 >>> --- a/drivers/gpu/drm/vkms/vkms_composer.c >>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c >>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb, >>> * buffer [1] >>> */ >>> current_plane->pixel_read_line( >>> - current_plane->frame_info, >>> + current_plane, >>> x_start, >>> y_start, >>> direction, >>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h >>> index ccc5be009f15..a4f6456cb971 100644 >>> --- a/drivers/gpu/drm/vkms/vkms_drv.h >>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h >>> @@ -75,6 +75,8 @@ enum pixel_read_direction { >>> READ_RIGHT >>> }; >>> >>> +struct vkms_plane_state; >>> + >>> /** >>> <<<<<<< HEAD >>> * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame, >>> @@ -87,8 +89,8 @@ enum pixel_read_direction { >>> * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and >>> * x_end. >>> */ >>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum >>> - pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]); >>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start, >>> + enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]); >> >> This is the second or third time in this one series changing this type. >> Could you not do the change once, in its own patch if possible? > > Sorry, this is not a change here, but a wrong formatting (missed when > rebasing). > > Do you think that it make sense to re-order my patches and put this > typedef at the end? This way it is never updated. > >>> >>> /** >>> * vkms_plane_state - Driver specific plane state >>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c >>> index 46daea6d3ee9..515c80866a58 100644 >>> --- a/drivers/gpu/drm/vkms/vkms_formats.c >>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c >>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int >>> */ >>> return fb->offsets[plane_index] + >>> (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] + >>> - (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index]; >>> + (x / drm_format_info_block_height(format, plane_index)) * >>> + format->char_per_block[plane_index]; >> >> Shouldn't this be in the patch that added this code in the first place? > > Same as above, a wrong formatting, I will remove this change and keep > everything on one line (even if it's more than 100 chars, it is easier to > read). > >>> } >>> >>> /** >>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di >>> } >>> } >>> >>> +/** >>> + * get_subsampling() - Get the subsampling value on a specific direction >> >> subsampling divisor > > Thanks for this precision. > >>> + */ >>> +static int get_subsampling(const struct drm_format_info *format, >>> + enum pixel_read_direction direction) >>> +{ >>> + if (direction == READ_LEFT || direction == READ_RIGHT) >>> + return format->hsub; >>> + else if (direction == READ_DOWN || direction == READ_UP) >>> + return format->vsub; >>> + return 1; >> >> In this and the below function, personally I'd prefer switch-case, with >> a cannot-happen-scream after the switch, so the compiler can warn about >> unhandled enum values. > > As for the previous patch, I did not know about this compiler feature, > thanks! > >>> +} >>> + >>> +/** >>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter >>> + */ >>> +static int get_subsampling_offset(const struct drm_format_info *format, >>> + enum pixel_read_direction direction, int x_start, int y_start) >> >> 'start' values as "increments" for a pixel counter? Is something >> misnamed here? >> >> Is it an increment or an offset? > > I don't really know how to name the function. I'm open to suggestions > x_start and y_start are really the coordinate of the starting reading point. > > To explain what it does: > > When using subsampling, you have to read the next pixel of planes[1..4] > not at the same "speed" as plane[0]. But I can't only rely on > "read_pixel_count % subsampling == 0", because it means that the pixel > incrementation on planes[1..4] may not be aligned with the buffer (if > hsub=2 and the start pixel is 1, I need to increment planes[1..4] only > for x=2,4,6... not 1,3,5...). > > A way to ensure this is to add an "offset" to count, which ensure that the > count % subsampling == 0 on the correct pixel. > > I made an error, the switch case must be (as count is always counting up, > for "inverted" reading direction a negative number ensure that > %subsampling == 0 on the correct pixel): > > switch (direction) { > case READ_UP: > return -y_start; > case READ_DOWN: > return y_start; > case READ_LEFT: > return -x_start; > case READ_RIGHT: > return x_start; > } > >>> +{ >>> + if (direction == READ_RIGHT || direction == READ_LEFT) >>> + return x_start; >>> + else if (direction == READ_DOWN || direction == READ_UP) >>> + return y_start; >>> + return 0; >>> +} >>> + > > [...] > >>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8, >>> + enum drm_color_encoding encoding, enum drm_color_range range) >>> +{ >>> + static const s16 bt601_full[3][3] = { >>> + { 256, 0, 359 }, >>> + { 256, -88, -183 }, >>> + { 256, 454, 0 }, >>> + }; > > [...] > >>> + >>> + u8 r = 0; >>> + u8 g = 0; >>> + u8 b = 0; >>> + bool full = range == DRM_COLOR_YCBCR_FULL_RANGE; >>> + unsigned int y_offset = full ? 0 : 16; >>> + >>> + switch (encoding) { >>> + case DRM_COLOR_YCBCR_BT601: >>> + ycbcr2rgb(full ? bt601_full : bt601, >> >> Doing all these conditional again pixel by pixel is probably >> inefficient. Just like with the line reading functions, you could pick >> the matrix in advance. > > I don't think the performance impact is huge (it's only a pair of if), but > yes, it's an easy optimization. > > I will create a conversion_matrix structure: > > struct conversion_matrix { > s16 matrix[3][3]; > u16 y_offset; > } > > I will create a `get_conversion_matrix_to_argb_u16` function to get this > structure from a format+encoding+range. > > I will also add a field `conversion_matrix` in struct vkms_plane_state to > get this matrix only once per plane setup. > > >>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b); >>> + break; >>> + case DRM_COLOR_YCBCR_BT709: >>> + ycbcr2rgb(full ? rec709_full : rec709, >>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b); >>> + break; >>> + case DRM_COLOR_YCBCR_BT2020: >>> + ycbcr2rgb(full ? bt2020_full : bt2020, >>> + yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b); >>> + break; >>> + default: >>> + pr_warn_once("Not supported color encoding\n"); >>> + break; >>> + } >>> + >>> + argb_u16->r = r * 257; >>> + argb_u16->g = g * 257; >>> + argb_u16->b = b * 257; >> >> I wonder. Using 8-bit fixed point precision seems quite coarse for >> 8-bit pixel formats, and it's going to be insufficient for higher bit >> depths. Was supporting e.g. 10-bit YUV considered? There is even >> deeper, too, like DRM_FORMAT_P016. > > It's a good point, as I explained above, I took the conversion part as a > "black box" to avoid breaking (and debugging) stuff. I think it's easy to > switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in > the float part). > > Maybe Arthur have an opinion on this? Yeah, I too don't see why not we could do that. The 8-bit precision was sufficient for those formats, but as well noted by Pekka this could be a problem for higher bit depths. I just need to make my terrible python script spit those values XD. > Just to be sure, the DRM subsystem don't have such matrix somewhere? It > can be nice to avoid duplicating them. As to my knowledge it does not exist on DRM, I think those are normally on the hardware itself (*please* correct me if I'm wrong). But, v4l2 has a similar table on drivers/media/common/v4l2-tpg/v4l2-tpg-core.c (Actually, I started my code based on this), unfortunately it's only 8-bit too. Best Regards, ~Arthur Grillo > >>> +} + /* * The following functions are read_line function for each >>> pixel format supported by VKMS. * @@ -142,13 +250,13 @@ static void >>> RGB565_to_argb_u16(struct pixel_argb_u16 *out_pixel, const u16 *pixe >>> * [1]: >>> https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@xxxxxxxxxx/ >>> */ >>> >>> -static void ARGB8888_read_line(struct vkms_frame_info *frame_info, >>> int x_start, int y_start, +static void ARGB8888_read_line(struct >>> vkms_plane_state *plane, int x_start, int y_start, enum >>> pixel_read_direction direction, int count, struct pixel_argb_u16 >>> out_pixel[]) { - u8 *src_pixels = packed_pixels_addr(frame_info, >>> x_start, y_start, 0); + u8 *src_pixels = >>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); >>> >>> - int step = get_step_1x1(frame_info->fb, direction, 0); + >>> int step = get_step_1x1(plane->frame_info->fb, direction, 0); >> >> These are the kind of changes I would not expect to see in a patch >> adding YUV support. There are a lot of them, too. > > I will put it directly this change in PATCHv2 5/9. > > [...] > >>> +static void semi_planar_yuv_read_line(struct vkms_plane_state >>> *plane, int x_start, int y_start, + >>> enum pixel_read_direction direction, int count, + >>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane = >>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); + >>> u8 *uv_plane = packed_pixels_addr(plane->frame_info, + >>> x_start / plane->frame_info->fb->format->hsub, + >>> y_start / plane->frame_info->fb->format->vsub, + >>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y = >>> get_step_1x1(plane->frame_info->fb, direction, 0); + int >>> step_uv = get_step_1x1(plane->frame_info->fb, direction, 1); + >>> int subsampling = get_subsampling(plane->frame_info->fb->format, >>> direction); + int subsampling_offset = >>> get_subsampling_offset(plane->frame_info->fb->format, direction, + >>> x_start, y_start); // 0 + + for (int i = 0; i < count; i++) { + >>> yuv_u8.y = y_plane[0]; + yuv_u8.u = uv_plane[0]; + >>> yuv_u8.v = uv_plane[1]; + + yuv_u8_to_argb_u16(out_pixel, >>> &yuv_u8, plane->base.base.color_encoding, + >>> plane->base.base.color_range); >> >> Oh, so this was the reason to change the read-line function >> signature. Maybe just stash a pointer to the right matrix and the >> right y_offset in frame_info instead? > > Yes, that why I changed the signature. I think I will keep this > signature and put the conversion_matrix inside the vkms_plane_state, > for me it make more sense to have pixel_read_line and > conversion_matrix in the same structure. > >>> + out_pixel += 1; + y_plane += step_y; + >>> if ((i + subsampling_offset + 1) % subsampling == 0) + >>> uv_plane += step_uv; + } +} + +static void >>> semi_planar_yvu_read_line(struct vkms_plane_state *plane, int >>> x_start, int y_start, + enum >>> pixel_read_direction direction, int count, + >>> struct pixel_argb_u16 out_pixel[]) +{ + u8 *y_plane = >>> packed_pixels_addr(plane->frame_info, x_start, y_start, 0); + >>> u8 *vu_plane = packed_pixels_addr(plane->frame_info, + >>> x_start / plane->frame_info->fb->format->hsub, + >>> y_start / plane->frame_info->fb->format->vsub, + >>> 1); + struct pixel_yuv_u8 yuv_u8; + int step_y = >>> get_step_1x1(plane->frame_info->fb, direction, 0); + int >>> step_vu = get_step_1x1(plane->frame_info->fb, direction, 1); + >>> int subsampling = get_subsampling(plane->frame_info->fb->format, >>> direction); + int subsampling_offset = >>> get_subsampling_offset(plane->frame_info->fb->format, direction, + >>> x_start, y_start); + for (int i = 0; i < count; i++) { + >>> yuv_u8.y = y_plane[0]; + yuv_u8.u = vu_plane[1]; + >>> yuv_u8.v = vu_plane[0]; >> >> You could swap matrix columns instead of writing this whole new >> function for UV vs. VU. Just an idea. > > I was not happy with this duplication too, but I did not think about > switching columns. That's a good idea, thanks! > > Kind regards, Louis Chauvet > > [...] >