Re: [PATCH] media: vim2m: Fix RGB 565 BE/LE support

Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> · Fri, 29 Mar 2019 11:24:40 -0300

Em Fri, 29 Mar 2019 15:01:23 +0100
Hans Verkuil <hverkuil-cisco@xxxxxxxxx> escreveu:

> On 3/29/19 2:44 PM, Mauro Carvalho Chehab wrote:
> > The support for those two formats are archtecture-dependent.
> > Use the endianness to CPU macros to do it right.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
> > ---
> >  drivers/media/platform/vim2m.c | 30 +++++++++++++++++-------------
> >  1 file changed, 17 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/media/platform/vim2m.c b/drivers/media/platform/vim2m.c
> > index 3c3a6a03b948..243c82b5d537 100644
> > --- a/drivers/media/platform/vim2m.c
> > +++ b/drivers/media/platform/vim2m.c
> > @@ -302,7 +302,7 @@ static void copy_two_pixels(struct vim2m_q_data *q_data_in,
> >  	switch (in->fourcc) {
> >  	case V4L2_PIX_FMT_RGB565: /* rrrrrggg gggbbbbb */
> >  		for (i = 0; i < 2; i++) {
> > -			u16 pix = *(u16 *)(src[i]);
> > +			u16 pix = le16_to_cpu(*(__le16 *)(src[i]));
> >  
> >  			*r++ = (u8)(((pix & 0xf800) >> 11) << 3) | 0x07;
> >  			*g++ = (u8)((((pix & 0x07e0) >> 5)) << 2) | 0x03;
> > @@ -311,12 +311,11 @@ static void copy_two_pixels(struct vim2m_q_data *q_data_in,
> >  		break;
> >  	case V4L2_PIX_FMT_RGB565X: /* gggbbbbb rrrrrggg */
> >  		for (i = 0; i < 2; i++) {
> > -			u16 pix = *(u16 *)(src[i]);
> > +			u16 pix = be16_to_cpu(*(__be16 *)(src[i]));
> >  
> > -			*r++ = (u8)(((0x00f8 & pix) >> 3) << 3) | 0x07;
> > -			*g++ = (u8)(((pix & 0x7) << 2) |
> > -				    ((pix & 0xe000) >> 5)) | 0x03;
> > -			*b++ = (u8)(((pix & 0x1f00) >> 8) << 3) | 0x07;
> > +			*r++ = (u8)(((pix & 0xf800) >> 11) << 3) | 0x07;
> > +			*g++ = (u8)((((pix & 0x07e0) >> 5)) << 2) | 0x03;
> > +			*b++ = (u8)((pix & 0x1f) << 3) | 0x07;
> >  		}
> >  		break;
> >  	default:
> > @@ -345,21 +344,26 @@ static void copy_two_pixels(struct vim2m_q_data *q_data_in,
> >  	switch (out->fourcc) {
> >  	case V4L2_PIX_FMT_RGB565: /* rrrrrggg gggbbbbb */
> >  		for (i = 0; i < 2; i++) {
> > -			u16 *pix = (u16 *)*dst;
> > +			u16 pix;
> > +			__le16 *dst_pix = (__le16 *)*dst;
> >  
> > -			*pix = ((*r << 8) & 0xf800) | ((*g << 3) & 0x07e0) |
> > -			       (*b >> 3);
> > +			pix = ((*r << 8) & 0xf800) | ((*g << 3) & 0x07e0) |
> > +			      (*b >> 3);
> > +
> > +			*dst_pix = cpu_to_le16(pix);
> >  
> >  			*dst += 2;
> >  		}
> >  		return;
> >  	case V4L2_PIX_FMT_RGB565X: /* gggbbbbb rrrrrggg */
> >  		for (i = 0; i < 2; i++) {
> > -			u16 *pix = (u16 *)*dst;
> > -			u8 green = *g++ >> 2;
> > +			u16 pix;
> > +			__be16 *dst_pix = (__be16 *)*dst;
> >  
> > -			*pix = ((green << 8) & 0xe000) | (green & 0x07) |
> > -			       ((*b++ << 5) & 0x1f00) | ((*r++ & 0xf8));
> > +			pix = ((*r << 8) & 0xf800) | ((*g << 3) & 0x07e0) |
> > +			      (*b >> 3);
> > +
> > +			*dst_pix = cpu_to_be16(pix);
> >  
> >  			*dst += 2;
> >  		}
> >   
> 
> Why not just deal with the bytes as u8 values? All the casts and endian
> conversions just make it unnecessarily complicated IMHO.
> 
> E.g. the last case can be replaced by this (if I didn't make any mistakes):
> 
>  	case V4L2_PIX_FMT_RGB565X: /* gggbbbbb rrrrrggg */
>  		for (i = 0; i < 2; i++) {
> 			u8 green = *g++ >> 2;
> 
> 			*(*dst)++ = ((green & 0x07) << 5) | (*b++ >> 3);
> 			*(*dst)++ = ((green & 0x38) >> 3) | (*r++ & 0xf8);
>  		}
> 
> I think that's much better.

I considered that. I opted to use the endiannes way because of two
reasons:

1) when the endiannes matches the CPU endiannes, the function does
nothing and GCC will optimize it. As this function is slow, as it should
be called for every single byte of the image, all optimizations are
welcomed.

2) Both LE and BE conversions are now identical:

   RGB16 -> RGB24:

	*r++ = (u8)(((pix & 0xf800) >> 11) << 3) | 0x07;
	*g++ = (u8)((((pix & 0x07e0) >> 5)) << 2) | 0x03;
	*b++ = (u8)((pix & 0x1f) << 3) | 0x07;

   RGB24 -> RGB16:

	pix = ((*r << 8) & 0xf800) | ((*g << 3) & 0x07e0) |(*b >> 3);

The only thing that differs now is the order where the bytes are read
or written.

IMO, having the same code for the conversion actually makes it a lot
less complex to understand and check its implementation.

Thanks,
Mauro