On Thu, 11 Nov 2021 11:07:21 -0300 Igor Torrente <igormtorrente@xxxxxxxxx> wrote: > Hi Pekka, > > On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen <ppaalanen@xxxxxxxxx> wrote: > > > > On Wed, 10 Nov 2021 13:56:54 -0300 > > Igor Torrente <igormtorrente@xxxxxxxxx> wrote: > > > > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen <ppaalanen@xxxxxxxxx> wrote: > > > > > > > > Hi Igor, > > > > > > > > again, that is a really nice speed-up. Unfortunately, I find the code > > > > rather messy and hard to follow. I hope my comments below help with > > > > re-designing it to be easier to understand. > > > > > > > > > > > > On Tue, 26 Oct 2021 08:34:06 -0300 > > > > Igor Torrente <igormtorrente@xxxxxxxxx> wrote: > > > > > > > > > Currently the blend function only accepts XRGB_8888 and ARGB_8888 > > > > > as a color input. > > > > > > > > > > This patch refactors all the functions related to the plane composition > > > > > to overcome this limitation. > > > > > > > > > > Now the blend function receives a struct `vkms_pixel_composition_functions` > > > > > containing two handlers. > > > > > > > > > > One will generate a buffer of each line of the frame with the pixels > > > > > converted to ARGB16161616. And the other will take this line buffer, > > > > > do some computation on it, and store the pixels in the destination. > > > > > > > > > > Both the handlers have the same signature. They receive a pointer to > > > > > the pixels that will be processed(`pixels_addr`), the number of pixels > > > > > that will be treated(`length`), and the intermediate buffer of the size > > > > > of a frame line (`line_buffer`). > > > > > > > > > > The first function has been totally described previously. > > > > > > > > What does this sentence mean? > > > > > > In the sentence "One will generate...", I give an overview of the two types of > > > handlers. And the overview of the first handler describes the full behavior of > > > it. > > > > > > But it doesn't look clear enough, I will improve it in the future. > > > > > > > > > > > > > > > > > The second is more interesting, as it has to perform two roles depending > > > > > on where it is called in the code. > > > > > > > > > > The first is to convert(if necessary) the data received in the > > > > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > > > > > > > The second role is to perform the `alpha_blend`. So, it takes the pixels > > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > > > > > the result back to the `pixels_addr`. > > > > > > > > > > The per-line implementation was chosen for performance reasons. > > > > > The per-pixel functions were having performance issues due to indirect > > > > > function call overhead. > > > > > > > > > > The per-line code trades off memory for execution time. The `line_buffer` > > > > > allows us to diminish the number of function calls. > > > > > > > > > > Results in the IGT test `kms_cursor_crc`: > > > > > > > > > > | Frametime | > > > > > |:---------------:|:---------:|:----------:|:--------:| > > > > > | implmentation | Current | Per-pixel | Per-line | > > > > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > > > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > > > > > > > Reported-by: kernel test robot <lkp@xxxxxxxxx> > > > > > Signed-off-by: Igor Torrente <igormtorrente@xxxxxxxxx> > > > > > --- > > > > > V2: Improves the performance drastically, by perfoming the operations > > > > > per-line and not per-pixel(Pekka Paalanen). > > > > > Minor improvements(Pekka Paalanen). > > > > > --- > > > > > drivers/gpu/drm/vkms/vkms_composer.c | 321 ++++++++++++++++----------- > > > > > drivers/gpu/drm/vkms/vkms_formats.h | 155 +++++++++++++ > > > > > 2 files changed, 342 insertions(+), 134 deletions(-) > > > > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > > > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c > > > > > index 383ca657ddf7..69fe3a89bdc9 100644 > > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c ... > > > > > +struct vkms_pixel_composition_functions { > > > > > + void (*get_src_line)(void *pixels_addr, int length, u64 *line_buffer); > > > > > + void (*set_output_line)(void *pixels_addr, int length, u64 *line_buffer); > > > > > > > > I would be a little more comfortable if instead of u64 *line_buffer you > > > > would have something like > > > > > > > > struct line_buffer { > > > > u16 *row; > > > > size_t nelem; > > > > } > > > > > > > > so that the functions to be plugged into these function pointers could > > > > assert that you do not accidentally overflow the array (which would > > > > imply a code bug in kernel). > > > > > > > > One could perhaps go even for: > > > > > > > > struct line_pixel { > > > > u16 r, g, b, a; > > > > }; > > > > > > > > struct line_buffer { > > > > struct line_pixel *row; > > > > size_t npixels; > > > > }; > > > > > > If we decide to follow this representation, would it be possible > > > to calculate the crc in the similar way that is being done currently? > > > > > > Something like that: > > > > > > crc = crc32_le(crc, line_buffer.row, w * sizeof(line_pixel)); > > > > Hi Igor, > > > > yes. I think the CRC calculated does not need to be reproducible in > > userspace, so you can very well compute it from the internal > > intermediate representation. It also does not need to be portable > > between architectures, AFAIU. > > Great! This will make things easier. > > > > > > I mean, If the compiler can decide to put a padding somewhere, it > > > would mess with the crc value. Right? > > > > Padding could mess it up, yes. However, I think in kernel it is a > > convention to define structs (especially UAPI structs but this is not > > one) such that there is no implicit padding. So there must be some > > recommended practises on how to achieve and ensure that. > > > > The size of struct line_pixel as defined above is 8 bytes which is a > > "very round" number, and every field has the same type, so there won't > > be gaps between fields either. So I think the struct should already be > > fine and have no padding, but how to make sure it is, I'm not sure what > > you would do in kernel land. > > > > In userspace I would put a static assert to ensure that > > sizeof(struct line_pixel) = 8. That would be enough, because sizeof > > counts not just internal implicit padding but also the needed size > > extension for alignment in an array of those. The accumulated size of > > the fields individually is 8 bytes, so if the struct size is 8, there > > cannot be padding. > > > > Apparently the kernel uses a compiler extension in a macro to do this > kind of struct packing. > > include/linux/compiler_attributes.h > 265:#define __packed __attribute__((__packed__)) Hi Igor, we do not actually want to force packing, though. If there would be padding without packing, then packing may incur a noticeable speed penalty in accessing the fields. We don't want to risk that. So I think it's better to just assert that no padding exists instead. There would be something quite strange going on if there was padding in this case, but better safe than sorry, because debugging that would be awful. Thanks, pq
Attachment:
pgpTQWWYeokrx.pgp
Description: OpenPGP digital signature