On 2025-01-15 at 23:46 +09, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > On Wed, Jan 15, 2025 at 10:30 PM Mikhail Rudenko <mike.rudenko@xxxxxxxxx> wrote: >> >> Hi Tomasz, >> >> On 2025-01-15 at 17:31 +09, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >> >> > Hi Mikhail and Laurent, >> > >> > On Wed, Jan 15, 2025 at 2:07 AM Mikhail Rudenko <mike.rudenko@xxxxxxxxx> wrote: >> >> >> >> >> >> Hi Laurent, >> >> >> >> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> wrote: >> >> >> >> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >> >> >> Currently, the rkisp1 driver always uses coherent DMA allocations for >> >> >> video capture buffers. However, on some platforms, using non-coherent >> >> >> buffers can improve performance, especially when CPU processing of >> >> >> MMAP'ed video buffers is required. >> >> >> >> >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, >> >> >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >> >> >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >> >> >> non-coherent DMA allocation. CPU usage also decreases accordingly. >> >> > >> >> > What's the time taken by the cache management operations ? >> >> >> >> Sorry for the late reply, your question turned out a little more >> >> interesting than I expected initially. :) >> >> >> >> When capturing using Yavta with MMAP buffers under the conditions mentioned >> >> in the commit message, ftrace gives 437.6 +- 1.1 us for >> >> dma_sync_sgtable_for_cpu and 409 +- 14 us for >> >> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent >> >> buffers in this case is more CPU-efficient even when considering cache >> >> management overhead. >> >> >> >> When trying to do the same measurements with libcamera, I failed. In a >> >> typical libcamera use case when MMAP buffers are allocated from a >> >> device, exported as dmabufs and then used for capture on the same device >> >> with DMABUF memory type, cache management in kernel is skipped [1] >> >> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so >> >> DMA_BUF_IOCTL_SYNC from userspace does not work either. >> > >> > Oops, so I believe this is a bug. When an MMAP buffer is allocated in >> > the non-coherent mode, those ops should perform proper cache >> > maintenance. >> >> Thanks for pointing this out! >> >> > Let me send a patch to fix this in a couple of days unless someone >> > does it earlier. >> >> Now that we know that this is a bug, not an API misuse from my side, I >> can fix this myself and send a v2. Would this be okay for you? > > I'd be more than happy :) Done, see [1]. A review would be appreciated. :) [1] https://lore.kernel.org/all/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@xxxxxxxxx/ -- Best regards, Mikhail Rudenko