Hello, I think there are some bugs in this implementation of multi-planar support (and not mylty-planar, there is a typo in the title), especially for the "new" drm_format_info description which uses block_w and block_h. I will propose two patches [1] solving these issues and hopefully also simplifying a bit the composition. TBH I'm not an expert, it's my first ever contribution in DRM, so please don't hesitate to correct me if you thin I missunderstood something, it actually took me a bit of time to fully understand the whole series and how it interacted with the rest of the vkms driver. > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y) > +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y, size_t index) > { > struct drm_framebuffer *fb = frame_info->fb; > > - return fb->offsets[0] + (y * fb->pitches[0]) > - + (x * fb->format->cpp[0]); > + return fb->offsets[index] + (y * fb->pitches[index]) > + + (x * fb->format->cpp[index]); > } > > /* > @@ -23,27 +23,25 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int > * @frame_info: Buffer metadata > * @x: The x(width) coordinate of the 2D buffer > * @y: The y(Heigth) coordinate of the 2D buffer > + * @index: The index of the plane on the 2D buffer > * > * Takes the information stored in the frame_info, a pair of coordinates, and > - * returns the address of the first color channel. > - * This function assumes the channels are packed together, i.e. a color channel > - * comes immediately after another in the memory. And therefore, this function > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21). > + * returns the address of the first color channel on the desired index. > */ > static void *packed_pixels_addr(const struct vkms_frame_info *frame_info, > - int x, int y) > + int x, int y, size_t index) > { > - size_t offset = pixel_offset(frame_info, x, y); > + size_t offset = pixel_offset(frame_info, x, y, index); > > return (u8 *)frame_info->map[0].vaddr + offset; > } This implementation of packed_pixels_addr will only work with block_w == block_h == 1. For packed or tiled formats we will need to use x/y information to extract the correct address, and this address will not be a single pixel. See below my explanation. [...] > @@ -130,17 +128,28 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state > { > struct pixel_argb_u16 *out_pixels = stage_buffer->pixels; > struct vkms_frame_info *frame_info = plane->frame_info; > - u8 *src_pixels = get_packed_src_addr(frame_info, y); > + const struct drm_format_info *frame_format = frame_info->fb->format; > int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels); > + u8 *src_pixels[DRM_FORMAT_MAX_PLANES]; > > - for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) { > + for (size_t i = 0; i < frame_format->num_planes; i++) > + src_pixels[i] = get_packed_src_addr(frame_info, y, i); > + > + for (size_t x = 0; x < limit; x++) { > int x_pos = get_x_position(frame_info, limit, x); > > - if (drm_rotation_90_or_270(frame_info->rotation)) > - src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1) > - + frame_info->fb->format->cpp[0] * y; > + if (drm_rotation_90_or_270(frame_info->rotation)) { > + for (size_t i = 0; i < frame_format->num_planes; i++) { > + src_pixels[i] = get_packed_src_addr(frame_info, > + x + frame_info->rotated.y1, i); > + src_pixels[i] += frame_format->cpp[i] * y; I find the current rotation management a bit complex to understand. This is not related to your patch, but as I had to understand this to create my second patch, I think this could be significanlty simplified. Please see the below comment about frame_format->cpp, it applies here too. I think the "easy" way here is simply to reuse the method get_packed_src_addr every time you need a new pixel. > + } > + } > > plane->pixel_read(src_pixels, &out_pixels[x_pos]); > + The usage of cpp and pointer to specific pixel only work for non-packed and non-blocked pixels, but for example NV30 or Y0L0 need more informations about the exact location of the pixel to convert and write the correct pixel value (each pixel can't be referenced directly by a pointer). For example NV30 uses 5 bytes to store 3 pixels (10 bits each), so to access the "middle" one you need to read the 5 bytes and do a small computation to extract it's value. I think a simple solution to handle most cases would be to profide two more parameters: the x and y positions of the pixel to copy, using "absolute coordinates" (i.e x=0,y=0 means the first byte of the src buffer, not the first pixel in the `drm_rect src`, this way the method `pixel_read` can extract the correct value). This way it become easy to manage "complex" pixel representations in this loop: simply increment x/y and let the pixel_read method handle everything. The second patch I will send is doing this. And as explained before, it will also simplify a lot the code related to rotation and translation (no more switch case everywhere to add offset to x/y, it simply use drm_rect_* helpers). It's not optimal in term of performance (in some situation it will read the same block multiple time to generate different pixels), but I believe it still is an intersting trade-off. In the future, if performance is actally critical, the whole composition loop will have to be specialized for each pixel formats: some can be treated line by line (as it's done today), but with blocks or packed pixels it's more complex. > + for (size_t i = 0; i < frame_format->num_planes; i++) > + src_pixels[i] += frame_format->cpp[i]; This is likely working with format with block_w != 1, see explanation above. [...] [1]: https://lore.kernel.org/dri-devel/20240201-yuv-v1-0-3ca376f27632@xxxxxxxxxxx/T/#t -- Louis Chauvet, Bootlin Embedded Linux and Kernel engineering https://bootlin.com