Re: [RFC PATCH v3 21/23] drm/vkms: add 3x4 matrix in color pipeline

Harry Wentland <harry.wentland@xxxxxxx> · Thu, 4 Jan 2024 11:12:30 -0500

On 2023-12-08 08:11, Pekka Paalanen wrote:
> On Wed, 8 Nov 2023 11:36:40 -0500
> Harry Wentland <harry.wentland@xxxxxxx> wrote:
> 
>> We add two 3x4 matrices into the VKMS color pipeline. The reason
>> we're adding matrices is so that we can test that application
>> of a matrix and its inverse yields an output equal to the input
>> image.
> 
> Would it not be better to mimic what a hardware driver might likely
> have? Or maybe that will be just a few more pipelines?
> 
> People testing their compositors would likely expect a more usual
> pipeline arrangement.
> 

True, but my focus with VKMS was to get the interface and core
code off the ground and make sure it's all (mostly) sane. For
more realistic testing we're currently focussing on creating the
AMD color pipeline in amdgpu.

There's definitely value in mimicing real HW with VKMS but it's
also a non-trivial task and somewhat lower in priority for me.
I think at this point there is higher value in getting a real HW
implementation off the ground.

Harry

>> One complication with the matrix implementation has to do with
>> the fact that the matrix entries are in signed-magnitude fixed
>> point, whereas the drm_fixed.h implementation uses 2s-complement.
>> The latter one is the one that we want for easy addition and
>> subtraction, so we convert all entries to 2s-complement.
>>
>> Signed-off-by: Harry Wentland <harry.wentland@xxxxxxx>
>> ---
>>  drivers/gpu/drm/vkms/vkms_colorop.c  | 32 +++++++++++++++++++++++++++-
>>  drivers/gpu/drm/vkms/vkms_composer.c | 27 +++++++++++++++++++++++
>>  2 files changed, 58 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_colorop.c b/drivers/gpu/drm/vkms/vkms_colorop.c
>> index 9a26b9fdc4a2..4e37e805c443 100644
>> --- a/drivers/gpu/drm/vkms/vkms_colorop.c
>> +++ b/drivers/gpu/drm/vkms/vkms_colorop.c
>> @@ -31,7 +31,37 @@ const int vkms_initialize_tf_pipeline(struct drm_plane *plane, struct drm_prop_e
>>  
>>  	prev_op = op;
>>  
>> -	/* 2nd op: 1d curve */
>> +	/* 2nd op: 3x4 matrix */
>> +	op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL);
>> +	if (!op) {
>> +		DRM_ERROR("KMS: Failed to allocate colorop\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_CTM_3X4);
>> +	if (ret)
>> +		return ret;
>> +
>> +	drm_colorop_set_next_property(prev_op, op);
>> +
>> +	prev_op = op;
>> +
>> +	/* 3rd op: 3x4 matrix */
>> +	op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL);
>> +	if (!op) {
>> +		DRM_ERROR("KMS: Failed to allocate colorop\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_CTM_3X4);
>> +	if (ret)
>> +		return ret;
>> +
>> +	drm_colorop_set_next_property(prev_op, op);
>> +
>> +	prev_op = op;
>> +
>> +	/* 4th op: 1d curve */
>>  	op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL);
>>  	if (!op) {
>>  		DRM_ERROR("KMS: Failed to allocate colorop\n");
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index d04a235b9fcd..c278fb223188 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -164,6 +164,30 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
>>  	}
>>  }
>>  
>> +static void apply_3x4_matrix(struct pixel_argb_s32 *pixel, const struct drm_color_ctm_3x4 *matrix)
>> +{
>> +	s64 rf, gf, bf;
>> +
>> +	rf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[0]), drm_int2fixp(pixel->r)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[1]), drm_int2fixp(pixel->g)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[2]), drm_int2fixp(pixel->b)) +
>> +	     drm_sm2fixp(matrix->matrix[3]);
> 
> Again, if you went for performance, you'd make a copy of the matrix in
> fixp format in advance, to avoid having to convert the same thing for
> every processed pixel.
> 
>> +
>> +	gf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[4]), drm_int2fixp(pixel->r)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[5]), drm_int2fixp(pixel->g)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[6]), drm_int2fixp(pixel->b)) +
>> +	     drm_sm2fixp(matrix->matrix[7]);
>> +
>> +	bf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[8]), drm_int2fixp(pixel->r)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[9]), drm_int2fixp(pixel->g)) +
>> +	     drm_fixp_mul(drm_sm2fixp(matrix->matrix[10]), drm_int2fixp(pixel->b)) +
>> +	     drm_sm2fixp(matrix->matrix[11]);
> 
> Likewise the repetition of int2fixp three times for the same value is
> probably hurting unless the compiler knows to eliminate the redundant
> calls.
> 
>> +
>> +	pixel->r = drm_fixp2int(rf);
>> +	pixel->g = drm_fixp2int(gf);
>> +	pixel->b = drm_fixp2int(bf);
> 
> Btw. why pick s32 and not fixp for your intermediate type?
> 
> Using both you get the limitations of both in range and precision.
> 
> 
> Thanks,
> pq
> 
>> +}
>> +
>>  static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop *colorop)
>>  {
>>  	/* TODO is this right? */
>> @@ -185,6 +209,9 @@ static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop *colo
>>  				DRM_DEBUG_DRIVER("unkown colorop 1D curve type %d\n", colorop_state->curve_1d_type);
>>  				break;
>>  		}
>> +	} else if (colorop->type == DRM_COLOROP_CTM_3X4) {
>> +		if (colorop_state->data)
>> +			apply_3x4_matrix(pixel, (struct drm_color_ctm_3x4 *) colorop_state->data->data);
>>  	}
>>  
>>  }
>