On Wed, 8 Nov 2023 11:36:40 -0500 Harry Wentland <harry.wentland@xxxxxxx> wrote: > We add two 3x4 matrices into the VKMS color pipeline. The reason > we're adding matrices is so that we can test that application > of a matrix and its inverse yields an output equal to the input > image. Would it not be better to mimic what a hardware driver might likely have? Or maybe that will be just a few more pipelines? People testing their compositors would likely expect a more usual pipeline arrangement. > One complication with the matrix implementation has to do with > the fact that the matrix entries are in signed-magnitude fixed > point, whereas the drm_fixed.h implementation uses 2s-complement. > The latter one is the one that we want for easy addition and > subtraction, so we convert all entries to 2s-complement. > > Signed-off-by: Harry Wentland <harry.wentland@xxxxxxx> > --- > drivers/gpu/drm/vkms/vkms_colorop.c | 32 +++++++++++++++++++++++++++- > drivers/gpu/drm/vkms/vkms_composer.c | 27 +++++++++++++++++++++++ > 2 files changed, 58 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/vkms/vkms_colorop.c b/drivers/gpu/drm/vkms/vkms_colorop.c > index 9a26b9fdc4a2..4e37e805c443 100644 > --- a/drivers/gpu/drm/vkms/vkms_colorop.c > +++ b/drivers/gpu/drm/vkms/vkms_colorop.c > @@ -31,7 +31,37 @@ const int vkms_initialize_tf_pipeline(struct drm_plane *plane, struct drm_prop_e > > prev_op = op; > > - /* 2nd op: 1d curve */ > + /* 2nd op: 3x4 matrix */ > + op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL); > + if (!op) { > + DRM_ERROR("KMS: Failed to allocate colorop\n"); > + return -ENOMEM; > + } > + > + ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_CTM_3X4); > + if (ret) > + return ret; > + > + drm_colorop_set_next_property(prev_op, op); > + > + prev_op = op; > + > + /* 3rd op: 3x4 matrix */ > + op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL); > + if (!op) { > + DRM_ERROR("KMS: Failed to allocate colorop\n"); > + return -ENOMEM; > + } > + > + ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_CTM_3X4); > + if (ret) > + return ret; > + > + drm_colorop_set_next_property(prev_op, op); > + > + prev_op = op; > + > + /* 4th op: 1d curve */ > op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL); > if (!op) { > DRM_ERROR("KMS: Failed to allocate colorop\n"); > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c > index d04a235b9fcd..c278fb223188 100644 > --- a/drivers/gpu/drm/vkms/vkms_composer.c > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > @@ -164,6 +164,30 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff > } > } > > +static void apply_3x4_matrix(struct pixel_argb_s32 *pixel, const struct drm_color_ctm_3x4 *matrix) > +{ > + s64 rf, gf, bf; > + > + rf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[0]), drm_int2fixp(pixel->r)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[1]), drm_int2fixp(pixel->g)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[2]), drm_int2fixp(pixel->b)) + > + drm_sm2fixp(matrix->matrix[3]); Again, if you went for performance, you'd make a copy of the matrix in fixp format in advance, to avoid having to convert the same thing for every processed pixel. > + > + gf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[4]), drm_int2fixp(pixel->r)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[5]), drm_int2fixp(pixel->g)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[6]), drm_int2fixp(pixel->b)) + > + drm_sm2fixp(matrix->matrix[7]); > + > + bf = drm_fixp_mul(drm_sm2fixp(matrix->matrix[8]), drm_int2fixp(pixel->r)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[9]), drm_int2fixp(pixel->g)) + > + drm_fixp_mul(drm_sm2fixp(matrix->matrix[10]), drm_int2fixp(pixel->b)) + > + drm_sm2fixp(matrix->matrix[11]); Likewise the repetition of int2fixp three times for the same value is probably hurting unless the compiler knows to eliminate the redundant calls. > + > + pixel->r = drm_fixp2int(rf); > + pixel->g = drm_fixp2int(gf); > + pixel->b = drm_fixp2int(bf); Btw. why pick s32 and not fixp for your intermediate type? Using both you get the limitations of both in range and precision. Thanks, pq > +} > + > static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop *colorop) > { > /* TODO is this right? */ > @@ -185,6 +209,9 @@ static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop *colo > DRM_DEBUG_DRIVER("unkown colorop 1D curve type %d\n", colorop_state->curve_1d_type); > break; > } > + } else if (colorop->type == DRM_COLOROP_CTM_3X4) { > + if (colorop_state->data) > + apply_3x4_matrix(pixel, (struct drm_color_ctm_3x4 *) colorop_state->data->data); > } > > }
Attachment:
pgpfcdceQWPp3.pgp
Description: OpenPGP digital signature