Re: [PATCH v2 6/9] drm/vkms: Add YUV support

Arthur Grillo <arthurgrillo@xxxxxxxxxx> · Mon, 4 Mar 2024 12:39:12 -0300

On 04/03/24 12:28, Louis Chauvet wrote:
> Le 29/02/24 - 14:12, Pekka Paalanen a écrit :
>> On Wed, 28 Feb 2024 22:52:09 -0300
>> Arthur Grillo <arthurgrillo@xxxxxxxxxx> wrote:
>>
>>> On 27/02/24 17:01, Arthur Grillo wrote:
>>>>
>>>>
>>>> On 27/02/24 12:02, Louis Chauvet wrote:  
>>>>> Hi Pekka,
>>>>>
>>>>> For all the comment related to the conversion part, maybe Arthur have an 
>>>>> opinion on it, I took his patch as a "black box" (I did not want to 
>>>>> break (and debug) it).
>>>>>
>>>>> Le 26/02/24 - 14:19, Pekka Paalanen a écrit :  
>>>>>> On Fri, 23 Feb 2024 12:37:26 +0100
>>>>>> Louis Chauvet <louis.chauvet@xxxxxxxxxxx> wrote:
>>>>>>  
>>>>>>> From: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
>>>>>>>
>>>>>>> Add support to the YUV formats bellow:
>>>>>>>
>>>>>>> - NV12
>>>>>>> - NV16
>>>>>>> - NV24
>>>>>>> - NV21
>>>>>>> - NV61
>>>>>>> - NV42
>>>>>>> - YUV420
>>>>>>> - YUV422
>>>>>>> - YUV444
>>>>>>> - YVU420
>>>>>>> - YVU422
>>>>>>> - YVU444
>>>>>>>
>>>>>>> The conversion matrices of each encoding and range were obtained by
>>>>>>> rounding the values of the original conversion matrices multiplied by
>>>>>>> 2^8. This is done to avoid the use of fixed point operations.
>>>>>>>
>>>>>>> Signed-off-by: Arthur Grillo <arthurgrillo@xxxxxxxxxx>
>>>>>>> [Louis Chauvet: Adapted Arthur's work and implemented the read_line_t
>>>>>>> callbacks for yuv formats]
>>>>>>> Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/vkms/vkms_composer.c |   2 +-
>>>>>>>  drivers/gpu/drm/vkms/vkms_drv.h      |   6 +-
>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.c  | 289 +++++++++++++++++++++++++++++++++--
>>>>>>>  drivers/gpu/drm/vkms/vkms_formats.h  |   4 +
>>>>>>>  drivers/gpu/drm/vkms/vkms_plane.c    |  14 +-
>>>>>>>  5 files changed, 295 insertions(+), 20 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> index e555bf9c1aee..54fc5161d565 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>>>>> @@ -312,7 +312,7 @@ static void blend(struct vkms_writeback_job *wb,
>>>>>>>  			 * buffer [1]
>>>>>>>  			 */
>>>>>>>  			current_plane->pixel_read_line(
>>>>>>> -				current_plane->frame_info,
>>>>>>> +				current_plane,
>>>>>>>  				x_start,
>>>>>>>  				y_start,
>>>>>>>  				direction,
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> index ccc5be009f15..a4f6456cb971 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>>>>> @@ -75,6 +75,8 @@ enum pixel_read_direction {
>>>>>>>  	READ_RIGHT
>>>>>>>  };
>>>>>>>  
>>>>>>> +struct vkms_plane_state;
>>>>>>> +
>>>>>>>  /**
>>>>>>>  <<<<<<< HEAD
>>>>>>>   * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>>>>>>> @@ -87,8 +89,8 @@ enum pixel_read_direction {
>>>>>>>   * @out_pixel: Pointer where to write the pixel value. Pixels will be written between x_start and
>>>>>>>   *  x_end.
>>>>>>>   */
>>>>>>> -typedef void (*pixel_read_line_t)(struct vkms_frame_info *frame_info, int x_start, int y_start, enum
>>>>>>> -	pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);
>>>>>>> +typedef void (*pixel_read_line_t)(struct vkms_plane_state *frame_info, int x_start, int y_start,
>>>>>>> +	enum pixel_read_direction direction, int count, struct pixel_argb_u16 out_pixel[]);  
>>>>>>
>>>>>> This is the second or third time in this one series changing this type.
>>>>>> Could you not do the change once, in its own patch if possible?  
>>>>>
>>>>> Sorry, this is not a change here, but a wrong formatting (missed when 
>>>>> rebasing).
>>>>>
>>>>> Do you think that it make sense to re-order my patches and put this 
>>>>> typedef at the end? This way it is never updated.
>>
>> I'm not sure, I haven't checked how it would change your patches. The
>> intermediate changes might get a lot uglier?
>>
>> Just try to fold changes so that you don't need to change something
>> twice over the series unless there is a good reason to. "How hard would
>> it be to review this?" is my measure stick.
> 
> It will not be uglier, it was just the order I did things. I first cleaned 
> the code and created this typedef (PATCHv2 4/9), and then rewrote the 
> composition, for which I had to change the typedef.
> 
> I also wanted to make my series easy to understand and make clear what is 
> my "main contribution" and what are "quality stuff, not related to my 
> contribution":
> - Prepare things (document existing state, format, typedef)
> - Big change (and update related doc, typedef)
> - Rebase some other stuff on my big change (YUV)
> 
> So yes, some parts are changed twice in preparation step and the "big 
> change".
> 
>>
>>>>>  
>>>>>>>  
>>>>>>>  /**
>>>>>>>   * vkms_plane_state - Driver specific plane state
>>>>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> index 46daea6d3ee9..515c80866a58 100644
>>>>>>> --- a/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>>>>> @@ -33,7 +33,8 @@ static size_t packed_pixels_offset(const struct vkms_frame_info *frame_info, int
>>>>>>>  	 */
>>>>>>>  	return fb->offsets[plane_index] +
>>>>>>>  	       (y / drm_format_info_block_width(format, plane_index)) * fb->pitches[plane_index] +
>>>>>>> -	       (x / drm_format_info_block_height(format, plane_index)) * format->char_per_block[plane_index];
>>>>>>> +	       (x / drm_format_info_block_height(format, plane_index)) *
>>>>>>> +	       format->char_per_block[plane_index];  
>>>>>>
>>>>>> Shouldn't this be in the patch that added this code in the first place?  
>>>>>
>>>>> Same as above, a wrong formatting, I will remove this change and keep 
>>>>> everything on one line (even if it's more than 100 chars, it is easier to 
>>>>> read).
>>
>> Personally I agree that readability is more important than strict line
>> length limits. I'm not sure how the kernel rolls there.
>>
>>>>>  
>>>>>>>  }
>>>>>>>  
>>>>>>>  /**
>>>>>>> @@ -84,6 +85,32 @@ static int get_step_1x1(struct drm_framebuffer *fb, enum pixel_read_direction di
>>>>>>>  	}
>>>>>>>  }
>>>>>>>  
>>>>>>> +/**
>>>>>>> + * get_subsampling() - Get the subsampling value on a specific direction  
>>>>>>
>>>>>> subsampling divisor  
>>>>>
>>>>> Thanks for this precision.
>>>>>  
>>>>>>> + */
>>>>>>> +static int get_subsampling(const struct drm_format_info *format,
>>>>>>> +			   enum pixel_read_direction direction)
>>>>>>> +{
>>>>>>> +	if (direction == READ_LEFT || direction == READ_RIGHT)
>>>>>>> +		return format->hsub;
>>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> +		return format->vsub;
>>>>>>> +	return 1;  
>>>>>>
>>>>>> In this and the below function, personally I'd prefer switch-case, with
>>>>>> a cannot-happen-scream after the switch, so the compiler can warn about
>>>>>> unhandled enum values.  
>>>>>
>>>>> As for the previous patch, I did not know about this compiler feature, 
>>>>> thanks!
>>>>>  
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * get_subsampling_offset() - Get the subsampling offset to use when incrementing the pixel counter
>>>>>>> + */
>>>>>>> +static int get_subsampling_offset(const struct drm_format_info *format,
>>>>>>> +				  enum pixel_read_direction direction, int x_start, int y_start)  
>>>>>>
>>>>>> 'start' values as "increments" for a pixel counter? Is something
>>>>>> misnamed here?
>>>>>>
>>>>>> Is it an increment or an offset?  
>>>>>
>>>>> I don't really know how to name the function. I'm open to suggestions
>>>>> x_start and y_start are really the coordinate of the starting reading point.
>>
>> I looks like it's an offset, so "offset" and "start" are good words.
>> Then the only misleading piece is the doc:
>>
>> 	"Get the subsampling offset to use when incrementing the pixel counter"
>>
>> This sounds like the offset is used when incrementing a counter, that
>> is, counter is increment by offset each time. That's my problem with
>> this.
>>
>> Fix just the doc, and it's good, I think.
>>
>>>>>
>>>>> To explain what it does:
>>>>>
>>>>> When using subsampling, you have to read the next pixel of planes[1..4] 
>>>>> not at the same "speed" as plane[0]. But I can't only rely on 
>>>>> "read_pixel_count % subsampling == 0", because it means that the pixel 
>>>>> incrementation on planes[1..4] may not be aligned with the buffer (if 
>>>>> hsub=2 and the start pixel is 1, I need to increment planes[1..4] only 
>>>>> for x=2,4,6... not 1,3,5...).
>>>>>
>>>>> A way to ensure this is to add an "offset" to count, which ensure that the 
>>>>> count % subsampling == 0 on the correct pixel.
>>
>> Yes, I think I did get that feeling from the code eventually somehow,
>> but it wouldn't hurt to explain it in the comment.
>>
>> "An offset for keeping the chroma siting consistent regardless of
>> x_start and y_start" maybe?
> 
> It is better yes, thanks!
> 
>>>>>
>>>>> I made an error, the switch case must be (as count is always counting up, 
>>>>> for "inverted" reading direction a negative number ensure that 
>>>>> %subsampling == 0 on the correct pixel):
>>>>>
>>>>> 	switch (direction) {
>>>>> 	case READ_UP:
>>>>> 		return -y_start;
>>>>> 	case READ_DOWN:
>>>>> 		return y_start;
>>>>> 	case READ_LEFT:
>>>>> 		return -x_start;
>>>>> 	case READ_RIGHT:
>>>>> 		return x_start;
>>>>> 	}
>>
>> Yes, the inverted reading directions are different indeed. I did not
>> think through if this works also for sub-sampling divisors > 2 which I
>> don't think are ever used.
> 
> I choosen those values because they should work with any sub-sampling 
> divisor.
> 
> hsub/vsub = 4 is used with DRM_FORMAT_YUV410/YVU410/YUV411/YVU411.
> 
>>
>> Does IGT find this mistake? If not, maybe IGT should be extended.
> 
> No, for two reasons:
> - The original version works fine for NV12/16/24 and YUV with *sub <= 2
>   (x+n%2 == x-n%2). It only breaks for *sub > 2.
> - YUV410/... are not supported by VKMS
> - IGT does not test different colors for rotations/translations (at least
>   for the tests I tried). I will see if it's possible to add things in 
>   kms_rotation_crc/kms_cursor_crc to test more colors format (at least 
>   one RGB and one YUV).
>  
>>>>>  
>>>>>>> +{
>>>>>>> +	if (direction == READ_RIGHT || direction == READ_LEFT)
>>>>>>> +		return x_start;
>>>>>>> +	else if (direction == READ_DOWN || direction == READ_UP)
>>>>>>> +		return y_start;
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +  
>>>>>
>>>>> [...]
>>>>>  
>>>>>>> +static void yuv_u8_to_argb_u16(struct pixel_argb_u16 *argb_u16, const struct pixel_yuv_u8 *yuv_u8,
>>>>>>> +			       enum drm_color_encoding encoding, enum drm_color_range range)
>>>>>>> +{
>>>>>>> +	static const s16 bt601_full[3][3] = {
>>>>>>> +		{ 256, 0,   359 },
>>>>>>> +		{ 256, -88, -183 },
>>>>>>> +		{ 256, 454, 0 },
>>>>>>> +	};  
>>>>>
>>>>> [...]
>>>>>  
>>>>>>> +
>>>>>>> +	u8 r = 0;
>>>>>>> +	u8 g = 0;
>>>>>>> +	u8 b = 0;
>>>>>>> +	bool full = range == DRM_COLOR_YCBCR_FULL_RANGE;
>>>>>>> +	unsigned int y_offset = full ? 0 : 16;
>>>>>>> +
>>>>>>> +	switch (encoding) {
>>>>>>> +	case DRM_COLOR_YCBCR_BT601:
>>>>>>> +		ycbcr2rgb(full ? bt601_full : bt601,  
>>>>>>
>>>>>> Doing all these conditional again pixel by pixel is probably
>>>>>> inefficient. Just like with the line reading functions, you could pick
>>>>>> the matrix in advance.  
>>>>>
>>>>> I don't think the performance impact is huge (it's only a pair of if), but 
>>>>> yes, it's an easy optimization. 
>>>>>
>>>>> I will create a conversion_matrix structure:
>>>>>
>>>>> 	struct conversion_matrix {
>>>>> 		s16 matrix[3][3];
>>>>> 		u16 y_offset;
>>>>> 	}
>>
>> When defining such a struct type, it would be good to document the
>> matrix layout (which one is row, which one is column), and what the s16
>> mean (fixed point?).
> 
> Ack
> 
>> Try to not mix signed and unsigned types, too. The C implicit type
>> promotion rules can be surprising. Just make everything signed while
>> computing, and convert to/from unsigned only for storage.
> 
> Ack, I will change to signed type.
> 
>>>>>
>>>>> I will create a `get_conversion_matrix_to_argb_u16` function to get this 
>>>>> structure from a format+encoding+range.
>>>>>
>>>>> I will also add a field `conversion_matrix` in struct vkms_plane_state to 
>>>>> get this matrix only once per plane setup.
>>
>> Alright. Let's see how that works.
>>
>>>>>
>>>>>  
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	case DRM_COLOR_YCBCR_BT709:
>>>>>>> +		ycbcr2rgb(full ? rec709_full : rec709,
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	case DRM_COLOR_YCBCR_BT2020:
>>>>>>> +		ycbcr2rgb(full ? bt2020_full : bt2020,
>>>>>>> +			  yuv_u8->y, yuv_u8->u, yuv_u8->v, y_offset, &r, &g, &b);
>>>>>>> +		break;
>>>>>>> +	default:
>>>>>>> +		pr_warn_once("Not supported color encoding\n");
>>>>>>> +		break;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	argb_u16->r = r * 257;
>>>>>>> +	argb_u16->g = g * 257;
>>>>>>> +	argb_u16->b = b * 257;  
>>>>>>
>>>>>> I wonder. Using 8-bit fixed point precision seems quite coarse for
>>>>>> 8-bit pixel formats, and it's going to be insufficient for higher bit
>>>>>> depths. Was supporting e.g. 10-bit YUV considered? There is even
>>>>>> deeper, too, like DRM_FORMAT_P016.  
>>>>>
>>>>> It's a good point, as I explained above, I took the conversion part as a 
>>>>> "black box" to avoid breaking (and debugging) stuff. I think it's easy to 
>>>>> switch to s32 bits matrix with 16.16 bits (or anything with more than 16 bits in 
>>>>> the float part).
>>>>>
>>>>> Maybe Arthur have an opinion on this?  
>>>>
>>>> Yeah, I too don't see why not we could do that. The 8-bit precision was
>>>> sufficient for those formats, but as well noted by Pekka this could be a
>>>> problem for higher bit depths. I just need to make my terrible python
>>>> script spit those values XD.  
>>>
>>> Finally, I got it working with 32-bit precision.
>>>
>>> I basically threw all my untrusted python code away, and started using
>>> the colour python framework suggested by Sebastian[1]. After knowing the
>>> right values (and staring at numbers for hours), I found that with a
>>> little bit of rounding, the conversion works.
>>>
>>> Also, while at it, I changed the name rec709 to bt709 to follow the
>>> pattern and added "_full" to the full ranges matrices.
>>>
>>> While using the library, I noticed that the red component is wrong on
>>> the color red in one test case.
>>>
>>> [1]: https://lore.kernel.org/all/20240115150600.GC160656@toolbox/
>>
>> That all sounds good. I wish the kernel code contained comments
>> explaining how exactly you computed those matrices with python/colour.
>> If the python snippets are not too long, including them verbatim as
>> code comments would be really nice for both reviewers and posterity.
>>
>> The same for the VKMS unit tests, too, how you got the expected result
>> values.
> 
> I edited the YUV support to have those s64 values.
> 
> @arthur, I will submit a v4 with this:
> - matrix selection in plane_atomic_update (so it's selected only once)
> - s64 numbers for matrix
> - avoiding multiple loop implementation by switching matrix columns

This looks good to me.

> 
> Regarding the YUV part, I don't feel confortable adressing Pekka's 
> comments, would you mind doing it?

I'm already doing that, how do you want me to send those changes? I reply to
your series, like a did before?

Best Regards,
~Arthur Grillo

> 
> Kind regards,
> Louis Chauvet
> 
> [...]
>