On Mon, Mar 10, 2025 at 5:52 PM Mikhail Rudenko <mike.rudenko@xxxxxxxxx> wrote: > > > Hi Nicolas, Tomasz, > > On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote: > > > Hi Mikhail, > > > > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : > >> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in > >> commit 129134e5415d ("media: media/v4l2: remove > >> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), > >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made > >> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was > >> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle > >> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained > >> no-ops, making cache maintenance for non-coherent dmabufs allocated > >> by > >> dma-contig impossible. > >> > >> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and > >> {flush,invalidate}_kernel_vmap_range calls to > >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent > >> buffers. > >> > >> Fixes: c0acf9cfeee0 ("media: videobuf2: handle > >> V4L2_MEMORY_FLAG_NON_COHERENT flag") > >> Cc: stable@xxxxxxxxxxxxxxx > >> Signed-off-by: Mikhail Rudenko <mike.rudenko@xxxxxxxxx> > >> --- > >> .../media/common/videobuf2/videobuf2-dma-contig.c | 22 > >> ++++++++++++++++++++++ > >> 1 file changed, 22 insertions(+) > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> index > >> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 > >> bc888a84a95d5 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> @@ -427,6 +427,17 @@ static int > >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction > >> direction) > >> { > >> + struct vb2_dc_buf *buf = dbuf->priv; > >> + struct sg_table *sgt = buf->dma_sgt; > >> + > >> + if (!buf->non_coherent_mem) > >> + return 0; > >> + > >> + if (buf->vaddr) > >> + invalidate_kernel_vmap_range(buf->vaddr, buf->size); > > > > What would make me a lot more confortable with this change is if you > > enable kernel mappings for one test. This will ensure you cover the > > call to "invalidate" in your testing. I'd like to know about the > > performance impact. With this implementation it should be identical to > > the VB2 one. > > > > I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1 > plane, 3686400 bytes). Capture process was pinned to "big" cores running > at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers > with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were > used as appropriate. For kernel mapping effect test, vb2_plane_vaddr > call was inserted into rkisp1_vb2_buf_init. > > The timings are as following: > > - memcpy coherent buffer: 7570 +/- 63 us > - memcpy non-coherent buffer: 1120 +/- 34 us > > without kernel mapping: > > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us > > with kernel mapping: > > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us > > So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5, > and the use of noncoherent buffers is justified -- at least on this platform. Thanks a lot for the additional testing. Acked-by: Tomasz Figa <tfiga@xxxxxxxxxxxx> Best regards, Tomasz