Hi Hans, > -----Original Message----- > From: Hans Verkuil <hverkuil-cisco@xxxxxxxxx> > Sent: Thursday, December 1, 2022 8:33 PM > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > <yuji2.ishikawa@xxxxxxxxxxxxx>; posciak@xxxxxxxxxxxx; > paul.kocialkowski@xxxxxxxxxxx; mchehab+samsung@xxxxxxxxxx; > linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Tomasz Figa > <tfiga@xxxxxxxxxxxx> > Subject: Re: Question for an accepted patch: use of DMA-BUF based videobuf2 > capture buffer with no-HW-cache-coherent HW > > Hi Yuji, > > On 26/10/2022 11:16, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > > Hi Hans, > > > >> -----Original Message----- > >> From: Hans Verkuil <hverkuil-cisco@xxxxxxxxx> > >> Sent: Monday, October 24, 2022 4:49 PM > >> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > >> <yuji2.ishikawa@xxxxxxxxxxxxx>; posciak@xxxxxxxxxxxx; > >> paul.kocialkowski@xxxxxxxxxxx; mchehab+samsung@xxxxxxxxxx; > >> linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > >> Subject: Re: Question for an accepted patch: use of DMA-BUF based > >> videobuf2 capture buffer with no-HW-cache-coherent HW > >> > >> Hi Yuji, > >> > >> On 10/24/22 06:02, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > >>> Hi, > >>> > >>> I'm porting a V4L2 capture driver from 4.19.y to 5.10.y [1]. > >>> > >>> When I test the ported driver, I sometimes find a corruption on a > >>> captured > >> image. > >>> > >>> Because the corruption is exactly aligned with cacheline, I started > >> investigation from map/unmap of DMA-BUF. > >>> > >>> > >>> > >>> The capture driver uses DMA-BUF for videobuf2. > >>> > >>> The capture hardware does not have HW-mantained cache coherency with > >> CPU, that is, explicit map/unmap is essential on QBUF/DQBUF. > >>> > >>> After some hours of struggle, I found a patch removing cache > >>> synchronizations > >> on QBUF/DQBUF. > >>> > >>> > >>> > >>> > https://patchwork.kernel.org/project/linux-media/patch/20190124095156. > >>> 21898-1-paul.kocialkowski@xxxxxxxxxxx/ > >>> <https://patchwork.kernel.org/project/linux-media/patch/201901240951 > >>> 56 .21898-1-paul.kocialkowski@xxxxxxxxxxx/> > >>> > >>> > >>> > >>> When I removed this patch from my 5.10.y working-tree, the driver > >>> yielded images without any defects.v > >>> > >>> > >>> > >>> *************** > >>> > >>> Sorry for a mention to a patch released 4 years ago. > >>> > >>> The patch removes map/unmap on QBUF/DQBUF to improve the > >> performance of V4L2 decoder device, by reusing previously decoded frames. > >>> > >>> However, there seems no cares nor compensations for modifying > >>> lifecycle of > >> DMA-BUF, especially on video capture devices. > >> > >> I'm not entirely sure what you mean exactly. > >> > > My concern is consistency between ioctls and the state transition of capture > buffers. > > Generally, streaming I/O (DMA-BUF importing) buffers are handled following > by userland. > > > > Ioctl(VIDIOC_QBUF) -> /* DMA transfer from HW*/ -> ioctl(VIDIOC_DQBUF) > -> /* access from CPU */ -> ioctl(VIDIOC_QBUF) -> ... > > > > Therefore, expected semantics is that a buffer is owned by HW after QBUF, > and owned by CPU after DQBUF. > > In practice, ioctl(QBUF) kicks vb2_dc_map_dma_buf() and ioctl(DQBUF) kicks > vb2_dc_unmap_dma_buf() before applying the patch. > > This implementation keeps consistency in terms of cache coherency as > cache-clean is done in vb2_dc_map_dma_buf(). > > > > By applying the patch, ioctl(DQBUF) does not kick unmap_dma() anymore. > The similar for ioctl(QBUF). > > Therefore, in practice, a buffer is not owned by CPU just after ioctl(DQBUF). > > To keep compatibility of buffer operations, there should be delayed > map_dma()/unmap_dma() call just before DMA-transfer/CPU-access. > > However, no one referred to such function in the v4l2 framework in the > examination of the patch. > > Also, there is no advice for individual video device drivers; such that adding > map_dma()/unmap_dma() explicitly. > > The cache syncing is supposed to happen in __vb2_buf_mem_finish() where the > 'finish' memop is called. > > But for DMABUF it notes that: > > /* > * DMA exporter should take care of cache syncs, so we can avoid > * explicit ->prepare()/->finish() syncs. For other ->memory types > * we always need ->prepare() or/and ->finish() cache sync. > */ It seems I have misunderstood how DMA-BUF's cache syncs are maintained along with videobuf2 API calls. I understand that cache syncs are expected to be handled before prepare() and after finish(). The "ownership" transition along QBUF/DQBUF came from my misunderstanding, please forget. > And here https://docs.kernel.org/driver-api/dma-buf.html I read that userspace > must call DMA_BUF_IOCTL_SYNC to ensure the caches are synced before > using the buffer. > > Are you calling DMA_BUF_IOCTL_SYNC? Missing calling ioctl(DMA_BUF_IOCTL_SYNC) in userland was exactly the cause. I read the document, carried out experiments and found it worked completely. Very sorry to bother you. Regards, Yuji > I suspect that vb2_dc_unmap_dma_buf() caused a cache sync, so you never > noticed issues. > > Regards, > > Hans > > > > >>> > >>> > >>> > >>> Would you tell me some idea on this patch: > >>> > >>> * Do well-implemented capture drivers work well even if this patch is > applied? > >> > >> Yes, dmabuf is used extensively and I have not had any reports of issues. > > > > Many architectures can avoid this problem. > > A problem statistically occurs, only if a video capture HW does not have > HW-maintained cache coherency with CPU. > > Does this patch consider such case? > > > >>> > >>> * How should a video capture driver call V4L2/videobuf2 APIs, > >>> especially > >> when the hardware does not support cache coherency? > >> > >> It should all be handled correctly by the core frameworks. > >> > >> I think you need to debug more inside videobuf2-core.c. Some printk's > >> that show the dmabuf fd when the buffer is mapped and when it is > >> unmapped + the length it is mapping should hopefully help a bit. > > > > I added printk and dump_stack() to several functions. > > The patched function __prepare_dmabuf() is called every ioctl(QBUF). > > Function vb2_dc_map_dmabuf() is called only for the 1st call of ioctl(QBUF) > for a buffer instance. > > After that, vb2_dc_map_dmabuf() was never called, as the patch intended. > > > > Regards, > > Yuji > > > >> > >> Regards, > >> > >> Hans > >> > >>> > >>> > >>> > >>> *************** > >>> > >>> [1] FYI: the capture driver is not on mainline yet; the candidate > >>> is, > >>> > >>> https://lore.kernel.org/all/20220810132822.32534-1-yuji2.ishikawa@to > >>> sh > >>> iba.co.jp/ > >>> <https://lore.kernel.org/all/20220810132822.32534-1-yuji2.ishikawa@t > >>> os > >>> hiba.co.jp/> > >>> > >>> > >>> > >>> > >>> > >>> Regards, > >>> > >>> Yuji Ishikawa > >>>