Re: [PATCH v4 7/9] drm: vkms: Refactor the plane composer to accept new formats

Igor Torrente <igormtorrente@xxxxxxxxx> · Sun, 27 Feb 2022 10:19:49 -0400

Hi Pekka,

On 2/25/22 05:38, Pekka Paalanen wrote:
> On Thu, 24 Feb 2022 21:43:01 -0300
> Igor Torrente <igormtorrente@xxxxxxxxx> wrote:
>
>> Hi Pekka,
>>
>> On 2/10/22 06:37, Pekka Paalanen wrote:
>>> On Fri, 21 Jan 2022 18:38:29 -0300
>>> Igor Torrente <igormtorrente@xxxxxxxxx> wrote:
>>>
>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>> as a color input.
>>>>
>>>> This patch refactors all the functions related to the plane composition
>>>> to overcome this limitation.
>>>>
>>>> A new internal format(`struct pixel`) is introduced to deal with all
>>>> possible inputs. It consists of 16 bits fields that represent each of
>>>> the channels.
>>>>
>>>> The pixels blend is done using this internal format. And new handlers
>>>> are being added to convert a specific format to/from this internal format.
>>>>
>>>> So the blend operation depends on these handlers to convert to this common
>>>> format. The blended result, if necessary, is converted to the writeback
>>>> buffer format.
>>>>
>>>> This patch introduces three major differences to the blend function.
>>>> 1 - All the planes are blended at once.
>>>> 2 - The blend calculus is done as per line instead of per pixel.
>>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>>>       buffer(if necessary).
>>>>
>>>> These changes allow us to allocate way less memory in the intermediate
>>>> buffer to compute these operations. Because now we don't need to
>>>> have the entire intermediate image lines at once, just one line is
>>>> enough.
>>>>
>>>> | Memory consumption (output dimensions) |
>>>> |:--------------------------------------:|
>>>> |       Current      |     This patch    |
>>>> |:------------------:|:-----------------:|
>>>> |   Width * Heigth   |     2 * Width     |
>>>>
>>>> Beyond memory, we also have a minor performance benefit from all
>>>> these changes. Results running the IGT tests `*kms_cursor_crc*`:
>>>>
>>>> |                 Frametime                  |
>>>> |:------------------------------------------:|
>>>> |  Implementation |  Current  |  This commit |
>>>> |:---------------:|:---------:|:------------:|
>>>> | frametime range |  8~22 ms  |    5~18 ms   |
>>>> |     Average     |  10.0 ms  |    7.3 ms    |
>>>>
>>>> Reported-by: kernel test robot <lkp@xxxxxxxxx>
>>>> Signed-off-by: Igor Torrente <igormtorrente@xxxxxxxxx>
>>>> ---
>>>> V2: Improves the performance drastically, by perfoming the operations
>>>>       per-line and not per-pixel(Pekka Paalanen).
>>>>       Minor improvements(Pekka Paalanen).
>>>>
>>>> V3: Changes the code to blend the planes all at once. This improves
>>>>       performance, memory consumption, and removes much of the weirdness
>>>>       of the V2(Pekka Paalanen and me).
>>>>       Minor improvements(Pekka Paalanen and me).
>>>>
>>>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>>>> ---
>>>>    drivers/gpu/drm/vkms/Makefile        |   1 +
>>>>    drivers/gpu/drm/vkms/vkms_composer.c | 335 +++++++++++++--------------
>>>>    drivers/gpu/drm/vkms/vkms_formats.c  | 138 +++++++++++
>>>>    drivers/gpu/drm/vkms/vkms_formats.h  |  31 +++
>>>>    4 files changed, 333 insertions(+), 172 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>>
>>> Hi Igor,
>>>
>>> I'm really happy to see this, thanks!
>>>
>>> I still have some security/robustness and other comments below.
>>>
>>> I've deleted all the minus lines from the patch to make the new code
>>> more clear.
>>>
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>>>> index 72f779cbfedd..1b28a6a32948 100644
>>>> --- a/drivers/gpu/drm/vkms/Makefile
>>>> +++ b/drivers/gpu/drm/vkms/Makefile
>>>> @@ -3,6 +3,7 @@ vkms-y := \
>>>>       vkms_drv.o \
>>>>       vkms_plane.o \
>>>>       vkms_output.o \
>>>> +    vkms_formats.o \
>>>>       vkms_crtc.o \
>>>>       vkms_composer.o \
>>>>       vkms_writeback.o
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index 95029d2ebcac..9f70fcf84fb9 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -9,202 +9,210 @@
>>>>    #include <drm/drm_vblank.h>
>>>>
>>>>    #include "vkms_drv.h"
>>>> +#include "vkms_formats.h"
>>>>
>>>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>>>    {
>>>> +    u32 new_color;
>>>>
>>>> +    new_color = (src * 0xffff + dst * (0xffff - alpha));
>>>>
>>>> +    return DIV_ROUND_UP(new_color, 0xffff);
>>>
>>> Why round-up rather than the usual mathematical rounding?
>>
>> AFAIK, this is the only round that's present in the kernel. And if I
>> understood correctly it is the round toward positive infinity that we are
>> all used to use.
>
> Should be pretty easy to round-up from 0.5 and round-down otherwise.
> Just add a different offset than DIV_ROUND_UP does.
>
> Not having a ready-made macro and habits are not good
> justifications. The justification needs to be mathematical.

Actually, I wasn't trying to argue about anything. Sorry if it sounded
that way.

Also, I had some confusion and I have to correct what I said. The
DIV_ROUND_UP isn't the usual mathematical rounding, it's more
like a round and ceiling.

I don't know why I thought that was the usual rounding.

>
> The problem with DIV_ROUND_UP that I see, is that 0x00000001 gets
> rounded to 0x0001, and anything that is even slightly above 0xfffe0000
> gets rounded to 0xffff. So it seems to me that this adds a bias to the
> result.
>
> Is my intuition right or wrong, I'm not sure. I do know that
>
> 0xffff * 0xffff = 0xfffe0001
>
> so values greater than 0xfffe0001 cannot occur.
>
> That seems to mean that there is exactly one 32-bit value that
> DIV_ROUND_UPs to 0x0000 and exactly one 32-bit value that DIV_ROUND_UPs
> to 0xffff. That doesn't feel right to me.
>
> Would need to compare to how the blending with real numbers would work.

Yeah, I agree, it makes sense to me.

I will change that, thanks!

>
>
>>>>     */
>>>> +static void blend(struct vkms_frame_info *wb_frame_info,
>>>
>>> Using "wb" as short for writeback is... well, it's hard for the me
>>> remember at least. Could this not be named simply "writeback"?
>>
>> IMHO it's better to use wb instead of writeback for consistency. Given that wb
>> is used throughout the vkms code.
>
> Right, so that's a problem for me.
>
> Is any other driver using wb for writeback?

Apparently, yes.

rcar_du_writeback.c:24:struct rcar_du_wb_conn_state {
rcar_du_writeback.c:29:#define to_rcar_wb_conn_state
rcar_du_writeback.c:219:        struct rcar_du_wb_job *rjob; q1
komeda_wb_connector.c:111:komeda_wb_connector_detect(...)
komeda_wb_connector.c:78:static const struct drm_encoder_helper_funcs
komeda_wb_encoder_helper_funcs = {qq
amdgpu_device.c:1095:static void amdgpu_device_wb_fini(...)

>
> I don't mind using wb in local variables, but in type names I would
> personally prefer more descriptive names.
>
>
>>>> +    int h_dst = drm_rect_height(&primary_plane_info->dst);
>>>>       int y_limit = y_src + h_dst;
>>>> +    int y, i;
>>>
>>> It took me a while to understand that all these y-coordinates are CRTC
>>> coordinates. Maybe call them crtc_y, crtc_y_begin, crtc_y_end,
>>> crtc_y_height, etc.
>>>
>>>> +
>>>> +    for (y = y_src; y < y_limit; y++) {
>>>> +            plane_fmt_func[0](primary_plane_info, y, output_buffer);
>>>
>>> This is initializing output_buffer, right? So why do you have the TODO
>>> comment about clearing the primary plane above?
>>>
>>> Is it because the primary plane may not cover the CRTC exactly, the
>>> destination rectangle might be bigger or smaller?
>>>
>>> The output_buffer length should be the CRTC width, right?
>>>
>>> Maybe the special-casing the primary plane in this code is wrong.
>>> crtc_y needs to iterate over the CRTC height starting from zero. Then,
>>> you explicitly clear output_buffer to opaque background color, and
>>> primary plane becomes just another plane in the array of active planes
>>> with no special handling here.
>>>
>>> That will allow you to support overlay planes *below* the primary plane
>>> (as is fairly common in non-PC hardware), and you can even support the
>>> background color KMS property.
>>
>> I thought that the primary would always cover the entire screen exactly.
>
> Nope. Maybe PC hardware has such limitations, but I'm quite sure there
> are display controllers that do not require this. Therefore VKMS should
> support the more generic case, and possible offer a configuration knob
> to reject atomic state where primary plane is not active or not
> covering the whole CRTC in order to be able to test userspace against
> such driver behaviour.
>
> After all, background color KMS property exists.

Ok, makes sense.

>
>>
>> So yeah, my patch code assumes that CRTC is the same size as the primary plane.
>> (and if I'm not mistaken the current version also assumes it).
>>
>> But If this is not the case, where are the CRTC dimensions?
>> Are they in the CRTC properties? drm_mode_config?
>>
>> I couldn't find them.
>
> It's the active area of the current video mode, I believe. How that
> translated to DRM code, I don't know.

I will try to find it.

>
>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> new file mode 100644
>>>> index 000000000000..0d1838d1b835
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -0,0 +1,138 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0+ */
>>>> +
>>>> +#include <drm/drm_rect.h>
>>>> +#include "vkms_formats.h"
>>>> +
>>>> +format_transform_func get_fmt_transform_function(u32 format)
>>>> +{
>>>> +    if (format == DRM_FORMAT_ARGB8888)
>>>> +            return &ARGB8888_to_ARGB16161616;
>>>> +    else
>>>> +            return &XRGB8888_to_ARGB16161616;
>>>
>>> In functions like this you should prepare for caller errors. Use a
>>> switch, and fail any attempt to use a pixel format it doesn't support.
>>> Failing is much better than silently producing garbage or worse: buffer
>>> overruns when bytes-per-pixel is not what you expected.
>>>
>>> What to do on failure depends on whether the failure here is never
>>> supposed to happen (follow the kernel style) e.g. malicious userspace
>>> cannot trigger it, or if you actually use this function to define the
>>> supported for pixel formats.
>>
>> No, I don't use this function to define supported formats, It's defined:
>> - vkms_writeback.c:15
>> - vkms_plane.c:14 and 22
>>
>> And if I'm not mistaken the DRM framework takes care of validation.
>
> Then someone else comes, adds a new pixel format to those files, and
> does not even realize get_fmt_transform_function() exists.
>
> If you know that something must already ensure you cannot get
> unsupported pixel formats in this function, then I guess some kind of
> kernel panic here if you do get an unsupported pixel format would be
> appropriate? Or an oops.
>
> That would tell loud and clear to that other person they overlooked
> something. Assuming they test the code the new format.

Ok, got it. I will change it.

>
>>
>>>
>>> The latter means you'd have a list of all DRM pixel formats and then
>>> you'd ask for each one if this function knows it, and if yes, you add
>>> the format to the list of supported formats advertised to userspace. I
>>> don't know if that would be fine by DRM coding style.
>>>
>>>> +}
>>>> +
>>>> +format_transform_func get_wb_fmt_transform_function(u32 format)
>>>> +{
>>>> +    if (format == DRM_FORMAT_ARGB8888)
>>>> +            return &convert_to_ARGB8888;
>>>> +    else
>>>> +            return &convert_to_XRGB8888;
>>>> +}
>>>
>>> I think you could move the above getter functions to the bottom of the
>>> .c file, and make all the four *_to_* functions static, and remove them
>>> from the header.
>>
>> OK. I will do that.
>>
>> Question, what's the benefits of using static functions?
>
> Making code more contained. When people see that a function is static,
> they know it won't be directly referenced from any other file. This
> makes understanding easier. It's hygiene too: make everything static
> that could be static.
>
> Sometimes it can also have other benefits, like the compiler
> automatically inlining the whole thing, and not even emitting the
> independent function code. It might also speed up linking, as a static
> function cannot be a target.
>
>>>
>>>> +
>>>> +static int pixel_offset(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +    return frame_info->offset + (y * frame_info->pitch)
>>>> +                              + (x * frame_info->cpp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>> + *
>>>> + * @frame_info: Buffer metadata
>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>> + * @y: The y(Heigth) coordinate of the 2D buffer
>>>> + *
>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>> + * returns the address of the first color channel.
>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>> + * comes immediately after another in the memory. And therefore, this function
>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>> + */
>>>> +static void *packed_pixels_addr(struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +    int offset = pixel_offset(frame_info, x, y);
>>>> +
>>>> +    return (u8 *)frame_info->map[0].vaddr + offset;
>>>> +}
>>>> +
>>>> +static void *get_packed_src_addr(struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +    int x_src = frame_info->src.x1 >> 16;
>>>> +    int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>> +
>>>> +    return packed_pixels_addr(frame_info, x_src, y_src);
>>>> +}
>>>> +
>>>> +void ARGB8888_to_ARGB16161616(struct vkms_frame_info *frame_info, int y,
>>>> +                          struct line_buffer *stage_buffer)
>>>
>>> I'm fairly sure that DRM will one day add exactly ARGB16161616 format.
>
> Oops, I think the format already exists.
>
>>> But that will not be the format you use here (or it might be, but
>>> purely accidentally and depending on machine endianess and whatnot), so
>>> I would suggest inventing a new name. Also use the same name for the
>>> struct to hold a single pixel.
>>>
>>> E.g. struct pixel_argb_u16
>>
>> I'm terrible with names of variables, functions, etc. I will end-up with
>> ARGB8888_to_argb_u16.
>
> Sounds fine.
>
>>
>> I have a patch with the implementation of nv12 and YUV420 formats, but
>> I don't know how to test it because the ".*kms_plane@pixel-format*" igt test
>> doesn't support these formats (And also it isn't working anymore with my hack).
>>
>> Do you know how to test it?
>
> I think the best way would be to teach IGT to test it. Then everyone
> will automatically benefit from it.
>
> I don't really know anything about IGT code.
>
> FWIW, Weston has some YUV testing code I wrote in
> tests/yuv-buffer-test.c for the color conversions, but it's very
> limited scope (only BT.601, only limited range, and ignores chroma
> siting).

Right. I will do it later, As this patch series is already big enough.

Best Regards,
---
Igor Torrente

>
>
> Thanks,
> pq