Hi Kieran, On Wed, Sep 22, 2021 at 10:37:29PM +0100, Kieran Bingham wrote: > On 30/07/2021 03:05, Laurent Pinchart wrote: > > On R-Car Gen3, the DU uses a separate IP core named VSP to perform DMA > > from memory and composition of planes. The DU hardware then only handles > > the video timings and the interface with the encoders. This differs from > > Gen2, where the DU included a composer with DMA engines. > > > > When sourcing from the VSP, the DU hardware performs no memory access, > > and thus has no requirements on imported dma-buf memory types. The GEM > > CMA helpers however still create a DMA mapping to the DU device, which > > isn't used. The mapping to the VSP is done when processing the atomic > > commits, in the plane .prepare_fb() handler. > > > > When the system uses an IOMMU, the VSP device is attached to it, which > > enables the VSP to use non physically contiguous memory. The DU, as it > > performs no memory access, isn't connected to the IOMMU. The GEM CMA > > drm_gem_cma_prime_import_sg_table() helper will in that case fail to map > > non-contiguous imported dma-bufs, as the DMA mapping to the DU device > > will have multiple entries in its sgtable. The prevents using non > > physically contiguous memory for display. > > > > The DRM PRIME and GEM CMA helpers are designed to create the sgtable > > when the dma-buf is imported. By default, the device referenced by the > > drm_device is used to create the dma-buf attachment. Drivers can use a > > different device by using the drm_gem_prime_import_dev() function. While > > the DU has access to the VSP device, this won't help here, as different > > CRTCs use different VSP instances, connected to different IOMMU > > channels. The driver doesn't know at import time which CRTC a GEM object > > will be used, and thus can't select the right VSP device to pass to > > drm_gem_prime_import_dev(). > > > > To support non-contiguous memory, implement a custom > > .gem_prime_import_sg_table() operation that accepts all imported dma-buf > > regardless of the number of scatterlist entries. The sgtable will be > > mapped to the VSP at .prepare_fb() time, which will reject the > > framebuffer if the VSP isn't connected to an IOMMU. > > Wow - quite a lot to digest there. I tried to make it clear, but there's lots to explain :-S > > Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@xxxxxxxxxxxxxxxx> > > --- > > > > This can arguably be considered as a bit of a hack, as the GEM PRIME > > import still maps the dma-buf attachment to the DU, which isn't > > necessary. This is however not a big issue, as the DU isn't connected to > > any IOMMU, the DMA mapping thus doesn't waste any resource such as I/O > > memory space. Avoiding the mapping creation would require replacing the > > helpers completely, resulting in lots of code duplication. If this type > > of hardware setup was common we could create another set of helpers, but > > I don't think it would be worth it to support a single device. > > > > I have tested this patch with the cam application from libcamera, on a > > R-Car H3 ES2.x Salvator-XS board, importing buffers from the vimc > > driver: > > > > cam -c 'platform/vimc.0 Sensor B' \ > > -s pixelformat=BGR888,width=1440,height=900 \ > > -C -D HDMI-A-1 > > Are VIMC buffers always physically non-contiguous to validate this test? They're not, but with https://lore.kernel.org/linux-media/20210730001939.30769-1-laurent.pinchart+renesas@xxxxxxxxxxxxxxxx/ they can be made to be, with a module parameter. > > A set of patches to add DRM/KMS output support to cam has been posted. > > Until it gets merged (hopefully soon), it can be found at [1]. > > > > As cam doesn't support DRM/KMS scaling and overlay planes yet, the > > camera resolution needs to match the display resolution. Due to a > > peculiarity of the vimc driver, the resolution has to be divisible by 3, > > which may require changes to the resolution above depending on your > > monitor. > > > > A test patch is also needed for the kernel, to enable IOMMU support for > > the VSP, which isn't done by default (yet ?) in mainline. I have pushed > > a branch to [2] if anyone is interested. > > > > [1] https://lists.libcamera.org/pipermail/libcamera-devel/2021-July/022815.html > > [2] git://linuxtv.org/pinchartl/media.git drm/du/devel/gem/contig > > > > --- > > Changes since v1: > > > > - Rewrote commit message to explain issue in more details > > - Duplicate the imported scatter gather table in > > rcar_du_vsp_plane_prepare_fb() > > - Use separate loop counter j to avoid overwritting i > > - Update to latest drm_gem_cma API > > --- > > drivers/gpu/drm/rcar-du/rcar_du_drv.c | 6 +++- > > drivers/gpu/drm/rcar-du/rcar_du_kms.c | 49 +++++++++++++++++++++++++++ > > drivers/gpu/drm/rcar-du/rcar_du_kms.h | 7 ++++ > > drivers/gpu/drm/rcar-du/rcar_du_vsp.c | 36 +++++++++++++++++--- > > 4 files changed, 92 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c > > index cb34b1e477bc..d1f8d51a10fe 100644 > > --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c > > @@ -511,7 +511,11 @@ DEFINE_DRM_GEM_CMA_FOPS(rcar_du_fops); > > > > static const struct drm_driver rcar_du_driver = { > > .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, > > - DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(rcar_du_dumb_create), > > + .dumb_create = rcar_du_dumb_create, > > + .prime_handle_to_fd = drm_gem_prime_handle_to_fd, > > + .prime_fd_to_handle = drm_gem_prime_fd_to_handle, > > + .gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table, > > + .gem_prime_mmap = drm_gem_prime_mmap, > > .fops = &rcar_du_fops, > > .name = "rcar-du", > > .desc = "Renesas R-Car Display Unit", > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > index fdb8a0d127ad..7077af0886cf 100644 > > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c > > @@ -19,6 +19,7 @@ > > #include <drm/drm_vblank.h> > > > > #include <linux/device.h> > > +#include <linux/dma-buf.h> > > #include <linux/of_graph.h> > > #include <linux/of_platform.h> > > #include <linux/wait.h> > > @@ -325,6 +326,54 @@ const struct rcar_du_format_info *rcar_du_format_info(u32 fourcc) > > * Frame buffer > > */ > > > > +static const struct drm_gem_object_funcs rcar_du_gem_funcs = { > > + .free = drm_gem_cma_free_object, > > + .print_info = drm_gem_cma_print_info, > > + .get_sg_table = drm_gem_cma_get_sg_table, > > + .vmap = drm_gem_cma_vmap, > > + .mmap = drm_gem_cma_mmap, > > + .vm_ops = &drm_gem_cma_vm_ops, > > +}; > > + > > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev, > > + struct dma_buf_attachment *attach, > > + struct sg_table *sgt) > > +{ > > + struct rcar_du_device *rcdu = to_rcar_du_device(dev); > > + struct drm_gem_cma_object *cma_obj; > > + struct drm_gem_object *gem_obj; > > + int ret; > > + > > + if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE)) > > + return drm_gem_cma_prime_import_sg_table(dev, attach, sgt); > > + > > + /* Create a CMA GEM buffer. */ > > + cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL); > > + if (!cma_obj) > > + return ERR_PTR(-ENOMEM); > > + > > + gem_obj = &cma_obj->base; > > + gem_obj->funcs = &rcar_du_gem_funcs; > > + > > + drm_gem_private_object_init(dev, gem_obj, attach->dmabuf->size); > > + cma_obj->map_noncoherent = false; > > + > > + ret = drm_gem_create_mmap_offset(gem_obj); > > + if (ret) { > > + drm_gem_object_release(gem_obj); > > + goto error; > > + } > > + > > + cma_obj->paddr = 0; > > + cma_obj->sgt = sgt; > > + > > + return gem_obj; > > + > > +error: > > + kfree(cma_obj); > > + return ERR_PTR(ret); > > Almost seems overkill to handle this here, rather than inline in the > failure of drm_gem_create_mmap_offset() which would use 2 less lines in > the function .. > > But perhaps you're planning for it to be extended? > > However then wouldn't drm_gem_object_release() need to be handled in > error: as well? > > Personally I'd just inline the kfree in place of the goto error here. You're right, I'll do so. > But as that's the worst I can find in here so far: > > Reviewed-by: Kieran Bingham <kieran.bingham+renesas@xxxxxxxxxxxxxxxx> > > > +} > > + > > int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev, > > struct drm_mode_create_dumb *args) > > { > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.h b/drivers/gpu/drm/rcar-du/rcar_du_kms.h > > index 8f5fff176754..789154e19535 100644 > > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.h > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.h > > @@ -12,10 +12,13 @@ > > > > #include <linux/types.h> > > > > +struct dma_buf_attachment; > > struct drm_file; > > struct drm_device; > > +struct drm_gem_object; > > struct drm_mode_create_dumb; > > struct rcar_du_device; > > +struct sg_table; > > > > struct rcar_du_format_info { > > u32 fourcc; > > @@ -34,4 +37,8 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu); > > int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev, > > struct drm_mode_create_dumb *args); > > > > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev, > > + struct dma_buf_attachment *attach, > > + struct sg_table *sgt); > > + > > #endif /* __RCAR_DU_KMS_H__ */ > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c > > index 23e41c83c875..b7fc5b069cbc 100644 > > --- a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c > > @@ -187,17 +187,43 @@ int rcar_du_vsp_map_fb(struct rcar_du_vsp *vsp, struct drm_framebuffer *fb, > > struct sg_table sg_tables[3]) > > { > > struct rcar_du_device *rcdu = vsp->dev; > > - unsigned int i; > > + unsigned int i, j; > > int ret; > > > > for (i = 0; i < fb->format->num_planes; ++i) { > > struct drm_gem_cma_object *gem = drm_fb_cma_get_gem_obj(fb, i); > > struct sg_table *sgt = &sg_tables[i]; > > > > - ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr, gem->paddr, > > - gem->base.size); > > - if (ret) > > - goto fail; > > + if (gem->sgt) { > > + struct scatterlist *src; > > + struct scatterlist *dst; > > + > > + /* > > + * If the GEM buffer has a scatter gather table, it has > > + * been imported from a dma-buf and has no physical > > + * address as it might not be physically contiguous. > > + * Copy the original scatter gather table to map it to > > + * the VSP. > > + */ > > + ret = sg_alloc_table(sgt, gem->sgt->orig_nents, > > + GFP_KERNEL); > > + if (ret) > > + goto fail; > > + > > + src = gem->sgt->sgl; > > + dst = sgt->sgl; > > + for (j = 0; j < gem->sgt->orig_nents; ++j) { > > + sg_set_page(dst, sg_page(src), src->length, > > + src->offset); > > + src = sg_next(src); > > + dst = sg_next(dst); > > + } > > + } else { > > + ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr, > > + gem->paddr, gem->base.size); > > + if (ret) > > + goto fail; > > + } > > > > ret = vsp1_du_map_sg(vsp->vsp, sgt); > > if (ret) { -- Regards, Laurent Pinchart