On Thu, Jun 20, 2024 at 3:52 PM Hans Verkuil <hverkuil-cisco@xxxxxxxxx> wrote: > > On 19/06/2024 06:19, Tomasz Figa wrote: > > On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote: > >> > >> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit : > >>> Hi TaoJiang, > >>> > >>> On Tue, Jun 18, 2024 at 4:30 PM TaoJiang <tao.jiang_2@xxxxxxx> wrote: > >>>> > >>>> From: Ming Qian <ming.qian@xxxxxxx> > >>>> > >>>> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know > >>>> whether the dma buffer is coherent or synchronized. > >>>> > >>>> The videobuf2-core will skip cache syncs as it think the DMA exporter > >>>> should take care of cache syncs > >>>> > >>>> But in fact it's likely that the client doesn't > >>>> synchronize the dma buf before qbuf() or after dqbuf(). and it's > >>>> difficult to find this type of error directly. > >>>> > >>>> I think it's helpful that videobuf2-core can call > >>>> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the > >>>> cache syncs. > >>>> > >>>> Signed-off-by: Ming Qian <ming.qian@xxxxxxx> > >>>> Signed-off-by: TaoJiang <tao.jiang_2@xxxxxxx> > >>>> --- > >>>> .../media/common/videobuf2/videobuf2-core.c | 22 +++++++++++++++++++ > >>>> 1 file changed, 22 insertions(+) > >>>> > >>> > >>> Sorry, that patch is incorrect. I believe you're misunderstanding the > >>> way DMA-buf buffers should be managed in the userspace. It's the > >>> userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to > >>> signal start and end of CPU access to the kernel and imply necessary > >>> cache synchronization. > >>> > >>> [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls > >>> > >>> So, really sorry, but it's a NAK. > >> > >> > >> > >> This patch *could* make sense if it was inside UVC Driver as an example, as this > >> driver can import dmabuf, to CPU memcpy, and does omits the required sync calls > >> (unless that got added recently, I can easily have missed it). > > > > Yeah, currently V4L2 drivers don't call the in-kernel > > dma_buf_{begin,end}_cpu_access() when they need to access the buffers > > from the CPU, while my quick grep [1] reveals that we have 68 files > > retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a > > 100% guarantee of CPU access being done, but rather likely so). > > > > I also repeated the same thing with VB2_DMABUF [2] and tried to > > attribute both lists to specific drivers (by retaining the path until > > the first - or _ [3]; which seemed to be relatively accurate), leading > > to the following drivers that claim support for DMABUF while also > > retrieving plane vaddr (without proper synchronization - no drivers > > currently call any begin/end CPU access): > > > > i2c/video > > pci/bt8xx/bttv > > pci/cobalt/cobalt > > pci/cx18/cx18 > > pci/tw5864/tw5864 > > pci/tw686x/tw686x > > platform/allegro > > platform/amphion/vpu > > platform/chips > > platform/intel/pxa > > platform/marvell/mcam > > platform/mediatek/jpeg/mtk > > platform/mediatek/vcodec/decoder/mtk > > platform/mediatek/vcodec/encoder/mtk > > platform/nuvoton/npcm > > platform/nvidia/tegra > > platform/nxp/imx > > platform/renesas/rcar > > platform/renesas/vsp1/vsp1 > > platform/rockchip/rkisp1/rkisp1 > > platform/samsung/exynos4 > > platform/samsung/s5p > > platform/st/sti/delta/delta > > platform/st/sti/hva/hva > > platform/verisilicon/hantro > > usb/au0828/au0828 > > usb/cx231xx/cx231xx > > usb/dvb > > usb/em28xx/em28xx > > usb/gspca/gspca.c > > usb/hackrf/hackrf.c > > usb/stk1160/stk1160 > > usb/uvc/uvc > > > > which means we potentially have ~30 drivers which likely don't handle > > imported DMABUFs correctly (there is still a chance that DMABUF is > > advertised for one queue, while vaddr is used for another). > > > > I think we have two options: > > 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each > > driver and add calls to those > > I actually started on that 9 (!) years ago: > > https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access > > If memory serves, the main problem was that there were some drivers where > it wasn't clear what should be done. In the end I never continued this > work since nobody complained about it. > > This patch series adds vb2_plane_begin/end_cpu_access() functions, > replaces all calls to vb2_plane_vaddr() in drivers to the new functions, > and at the end removes vb2_plane_vaddr() altogether. > > > 2) take a heavy gun approach and just call vb2_begin_cpu_access() > > whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access() > > whenever vb2_buffer_done() is called (if begin was called before). > > > > The latter has the disadvantage of drivers not having control over the > > timing of the cache sync, so could end up with less than optimal > > performance. Also there could be some more complex cases, where the > > driver needs to mix DMA and CPU accesses to the buffer, so the fixed > > sequence just wouldn't work for them. (But then they just wouldn't > > work today either.) > > > > Hans, Marek, do you have any thoughts? (I'd personally just go with 2 > > and if any driver in the future needs something else, they could call > > begin/end CPU access manually.) > > I prefer 1. If nothing else, that makes it easy to identify drivers > that do such things. > > But perhaps a mix is possible: if a VB2 flag is set by the driver, then > approach 2 is used. That might help with the drivers where it isn't clear > what they should do. Although perhaps this can all be done in the driver > itself: instead of vb2_plane_vaddr they call vb2_begin_cpu_access for the > whole buffer, and at buffer_done time they call vb2_end_cpu_access. Should > work just as well for the very few drivers that need this. That's a good point. I guess we don't really need to dig so much into those drivers in this case. Just mechanically do the same for all of them (+/- maybe checking for some obvious corner cases which don't need the extra calls). Let me see if I can give it a stab. Best, Tomasz > > Regards, > > Hans > > > > > [1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq > > [2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq > > [3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq > > > > Best, > > Tomasz > > > >> > >> But generally speaking, bracketing all driver with CPU access synchronization > >> does not make sense indeed, so I second the rejection. > >> > >> Nicolas > >> > >>> > >>> Best regards, > >>> Tomasz > >>> > >>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > >>>> index 358f1fe42975..4734ff9cf3ce 100644 > >>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c > >>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c > >>>> @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb) > >>>> vb->synced = 1; > >>>> for (plane = 0; plane < vb->num_planes; ++plane) > >>>> call_void_memop(vb, prepare, vb->planes[plane].mem_priv); > >>>> + > >>>> + if (vb->memory != VB2_MEMORY_DMABUF) > >>>> + return; > >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { > >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; > >>>> + > >>>> + if (!dbuf) > >>>> + continue; > >>>> + > >>>> + dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir); > >>>> + } > >>>> } > >>>> > >>>> /* > >>>> @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb) > >>>> vb->synced = 0; > >>>> for (plane = 0; plane < vb->num_planes; ++plane) > >>>> call_void_memop(vb, finish, vb->planes[plane].mem_priv); > >>>> + > >>>> + if (vb->memory != VB2_MEMORY_DMABUF) > >>>> + return; > >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { > >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; > >>>> + > >>>> + if (!dbuf) > >>>> + continue; > >>>> + > >>>> + dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir); > >>>> + } > >>>> } > >>>> > >>>> /* > >>>> -- > >>>> 2.43.0-rc1 > >>>> > >> > > >