This patchset is the second version of [1]. It is almost a complete rewrite to use a line-by-line algorithm for the composition. During the development of this series Pekka and Arthur found an issue in drm core. The YUV part of this series depend on the fix [9]. I'll let Arthur extract it and submit a new independant patch. It can be divided in three parts: - PATCH 1 to 4: no functional change is intended, only some formatting and documenting (PATCH 2 is taken from [2]) - PATCH 5 to 8: Some preparation work not directly related to the line-by-line algorithm - PATCH 10: main patch for this series, it reintroduce the line-by-line algorithm - PATCH 11 to 15: taken from Arthur's series [2], with sometimes adaptation to use the pixel-by-pixel algorithm. - PATCH 16: Introduce the support for DRM_FORMAT_R1/2/4/8 The PATCH 10 aims to restore the line-by-line pixel reading algorithm. It was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") in a over-simplification effort. At this time, nobody noticed the performance impact of this commit. After the first iteration of my series, poeple notice performance impact, and it was the case. Pekka suggested to reimplement the line-by-line algorithm. Expiriments on my side shown great improvement for the line-by-line algorithm, and the performances are the same as the original line-by-line algorithm. I targeted my effort to make the code working for all the rotations and translations. The usage of helpers from drm_rect_* avoid reimplementing existing logic. The only "complex" part remaining is the clipping of the coordinate to avoid reading/writing outside of src/dst. Thus I added a lot of comments to help when someone will want to add some features (framebuffer resizing for example). The YUV part is not mandatory for this series, but as my first effort was to help the integration of YUV, I decided to rebase Arthur's series on mine to help. I took [3], [4], [5] and [6] and adapted them to use the line-by-line reading. They were also updated to use 32.32 fixed point values for yuv conversion instead of 8.8 fixed points. The last patch of this series introduce DRM_FORMAT_R1/2/4/8 to show how the PATCH 7/16 can be used to manage packed pixel formats. To properly test the rotation algorithm, I had to implement a new IGT test [8]. This helped to found one issue in the YUV rotation algortihm. My series was mainly tested with: - kms_plane (for color conversions) - kms_rotation_crc (for a subset of rotation and formats) - kms_rotation (to test all rotation and formats combinations) [8] - kms_cursor_crc (for translations) The benchmark used to measure the improvment was done with: - kms_fb_stress [1]: https://lore.kernel.org/all/20240201-yuv-v1-0-3ca376f27632@xxxxxxxxxxx [2]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-0-952fcaa5a193@xxxxxxxxxx/ [3]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-3-952fcaa5a193@xxxxxxxxxx/ [4]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-5-952fcaa5a193@xxxxxxxxxx/ [5]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-6-952fcaa5a193@xxxxxxxxxx/ [6]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-7-952fcaa5a193@xxxxxxxxxx/ [8]: https://lore.kernel.org/r/20240313-new_rotation-v2-0-6230fd5cae59@xxxxxxxxxxx [9]: https://lore.kernel.org/dri-devel/20240306-louis-vkms-conv-v1-1-5bfe7d129fdd@xxxxxxxxxx/ To: Rodrigo Siqueira <rodrigosiqueiramelo@xxxxxxxxx> To: Melissa Wen <melissa.srw@xxxxxxxxx> To: Maíra Canal <mairacanal@xxxxxxxxxx> To: Haneen Mohammed <hamohammed.sa@xxxxxxxxx> To: Daniel Vetter <daniel@xxxxxxxx> To: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> To: Maxime Ripard <mripard@xxxxxxxxxx> To: Thomas Zimmermann <tzimmermann@xxxxxxx> To: David Airlie <airlied@xxxxxxxxx> To: arthurgrillo@xxxxxxxxxx To: Jonathan Corbet <corbet@xxxxxxx> To: pekka.paalanen@xxxxxxxxxxxxx Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx Cc: jeremie.dautheribes@xxxxxxxxxxx Cc: miquel.raynal@xxxxxxxxxxx Cc: thomas.petazzoni@xxxxxxxxxxx Cc: seanpaul@xxxxxxxxxx Cc: marcheu@xxxxxxxxxx Cc: nicolejadeyee@xxxxxxxxxx Signed-off-by: Louis Chauvet <louis.chauvet@xxxxxxxxxxx> Note: after my changes, those tests seems to pass, so [7] may need updating (I did not check, it was maybe already the case): - kms_cursor_legacy@flip-vs-cursor-atomic - kms_pipe_crc_basic@nonblocking-crc - kms_pipe_crc_basic@nonblocking-crc-frame-sequence - kms_writeback@writeback-pixel-formats - kms_writeback@writeback-invalid-parameters - kms_flip@flip-vs-absolute-wf_vblank-interruptible And those tests pass, I did not investigate why the runners fails: - kms_flip@flip-vs-expired-vblank-interruptible - kms_flip@flip-vs-expired-vblank - kms_flip@plain-flip-fb-recreate - kms_flip@plain-flip-fb-recreate-interruptible - kms_flip@plain-flip-ts-check-interruptible - kms_cursor_legacy@cursorA-vs-flipA-toggle - kms_pipe_crc_basic@nonblocking-crc - kms_prop_blob@invalid-get-prop - kms_flip@flip-vs-absolute-wf_vblank-interruptible - kms_invalid_mode@zero-hdisplay - kms_invalid_mode@bad-vtotal - kms_cursor_crc.* (everything is SUCCEED or SKIP, except for rapid_movement) [7]: https://lore.kernel.org/all/20240201065346.801038-1-vignesh.raman@xxxxxxxxxxxxx/ Changes in v5: - All patches: fix some formatting issues - PATCH 4/16: Use the correct formatter for 4cc code - PATCH 7/16: Update the pixel accessors to also return the pixel position inside a block. - PATCH 8/16: Fix a temporary bug - PATCH 9/16: Update the get_step_1x1 to get_step_next_block and update the documentation - PATCH 10/16: Update to uses the new pixel accessors - PATCH 11/16: Update to use the new pixel accessors - PATCH 11/16: Fix a bug in the subsampling offset for inverted reading (right to left/bottom to top). Found by [8]. - PATCH 11/16: Apply Arthur's modifications (comments, algorithm clarification) - PATCH 11/16: Use the correct formatter for 4cc code - PATCH 11/16: Update to use the new get_step_next_block - PATCH 14/16: Apply Arthur's modification (comments, compilation issue) - PATCH 15/16: Add Arthur's patch to explain the kunit tests - PATCH 16/16: Introduce DRM_FORMAT_R* support. - Link to v4: https://lore.kernel.org/r/20240304-yuv-v4-0-76beac8e9793@xxxxxxxxxxx Changes in v4: - PATCH 3/14: Update comments for get_pixel_* functions - PATCH 4/14: Add WARN when trying to get unsupported pixel_* functions - PATCH 5/14: Create dummy pixel reader/writer to avoid NULL function pointers and kernel OOPS - PATCH 6/14: Added the usage of const pointers when needed - PATCH 7/14: Extraction of pixel accessors modification - PATCH 8/14: Extraction of the blending function modification - PATCH 9/14: Extraction of the pixel_read_direction enum - PATCH 10/14: Update direction_for_rotation documentation - PATCH 10/14: Rename conversion functions to be explicit - PATCH 10/14: Replace while(count) by while(out_pixel<end) in read_line callbacks. It avoid a new variable+addition in the composition hot path. - PATCH 11/14: Rename conversion functions to be explicit - PATCH 11/14: Update the documentation for get_subsampling_offset - PATCH 11/14: Add the matrix_conversion structure to remove a test from the hot path. - PATCH 11/14: Upadate matrix values to use 32.32 fixed floats for conversion - PATCH 12/14: Update commit message - PATCH 14/14: Change kunit expected value - Link to v3: https://lore.kernel.org/r/20240226-yuv-v3-0-ff662f0994db@xxxxxxxxxxx Changes in v3: - Correction of remaining git-rebase artefacts - Added Pekka in copy of this patch - Link to v2: https://lore.kernel.org/r/20240223-yuv-v2-0-aa6be2827bb7@xxxxxxxxxxx Changes in v2: - Rebased the series on top of drm-misc/drm-misc-net - Extract the typedef for pixel_read/pixel_write - Introduce the line-by-line algorithm per pixel format - Add some documentation for existing and new code - Port the series [1] to use line-by-line algorithm - Link to v1: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@xxxxxxxxxxx --- Arthur Grillo (6): drm/vkms: Use drm_frame directly drm/vkms: Add YUV support drm/vkms: Add range and encoding properties to the plane drm/vkms: Drop YUV formats TODO drm/vkms: Create KUnit tests for YUV conversions drm/vkms: Add how to run the Kunit tests Louis Chauvet (10): drm/vkms: Code formatting drm/vkms: write/update the documentation for pixel conversion and pixel write functions drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions drm/vkms: Add dummy pixel_read/pixel_write callbacks to avoid NULL pointers drm/vkms: Use const for input pointers in pixel_read an pixel_write functions drm/vkms: Update pixels accessor to support packed and multi-plane formats. drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend drm/vkms: Introduce pixel_read_direction enum drm/vkms: Re-introduce line-per-line composition algorithm drm/vkms: Add support for DRM_FORMAT_R* Documentation/gpu/vkms.rst | 14 +- drivers/gpu/drm/vkms/Kconfig | 15 + drivers/gpu/drm/vkms/Makefile | 1 + drivers/gpu/drm/vkms/tests/.kunitconfig | 4 + drivers/gpu/drm/vkms/tests/Makefile | 3 + drivers/gpu/drm/vkms/tests/vkms_format_test.c | 230 ++++++ drivers/gpu/drm/vkms/vkms_composer.c | 244 +++++-- drivers/gpu/drm/vkms/vkms_crtc.c | 6 +- drivers/gpu/drm/vkms/vkms_drv.c | 3 +- drivers/gpu/drm/vkms/vkms_drv.h | 85 ++- drivers/gpu/drm/vkms/vkms_formats.c | 983 ++++++++++++++++++++++---- drivers/gpu/drm/vkms/vkms_formats.h | 12 +- drivers/gpu/drm/vkms/vkms_plane.c | 50 +- drivers/gpu/drm/vkms/vkms_writeback.c | 5 - 14 files changed, 1440 insertions(+), 215 deletions(-) --- base-commit: ae4928daaf4d7b2012c97c9109f608fcf6c60df3 change-id: 20240201-yuv-1337d90d9576 Best regards, -- Louis Chauvet <louis.chauvet@xxxxxxxxxxx>