On 19/06/2024 06:19, Tomasz Figa wrote: > On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote: >> >> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit : >>> Hi TaoJiang, >>> >>> On Tue, Jun 18, 2024 at 4:30 PM TaoJiang <tao.jiang_2@xxxxxxx> wrote: >>>> >>>> From: Ming Qian <ming.qian@xxxxxxx> >>>> >>>> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know >>>> whether the dma buffer is coherent or synchronized. >>>> >>>> The videobuf2-core will skip cache syncs as it think the DMA exporter >>>> should take care of cache syncs >>>> >>>> But in fact it's likely that the client doesn't >>>> synchronize the dma buf before qbuf() or after dqbuf(). and it's >>>> difficult to find this type of error directly. >>>> >>>> I think it's helpful that videobuf2-core can call >>>> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the >>>> cache syncs. >>>> >>>> Signed-off-by: Ming Qian <ming.qian@xxxxxxx> >>>> Signed-off-by: TaoJiang <tao.jiang_2@xxxxxxx> >>>> --- >>>> .../media/common/videobuf2/videobuf2-core.c | 22 +++++++++++++++++++ >>>> 1 file changed, 22 insertions(+) >>>> >>> >>> Sorry, that patch is incorrect. I believe you're misunderstanding the >>> way DMA-buf buffers should be managed in the userspace. It's the >>> userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to >>> signal start and end of CPU access to the kernel and imply necessary >>> cache synchronization. >>> >>> [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls >>> >>> So, really sorry, but it's a NAK. >> >> >> >> This patch *could* make sense if it was inside UVC Driver as an example, as this >> driver can import dmabuf, to CPU memcpy, and does omits the required sync calls >> (unless that got added recently, I can easily have missed it). > > Yeah, currently V4L2 drivers don't call the in-kernel > dma_buf_{begin,end}_cpu_access() when they need to access the buffers > from the CPU, while my quick grep [1] reveals that we have 68 files > retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a > 100% guarantee of CPU access being done, but rather likely so). > > I also repeated the same thing with VB2_DMABUF [2] and tried to > attribute both lists to specific drivers (by retaining the path until > the first - or _ [3]; which seemed to be relatively accurate), leading > to the following drivers that claim support for DMABUF while also > retrieving plane vaddr (without proper synchronization - no drivers > currently call any begin/end CPU access): > > i2c/video > pci/bt8xx/bttv > pci/cobalt/cobalt > pci/cx18/cx18 > pci/tw5864/tw5864 > pci/tw686x/tw686x > platform/allegro > platform/amphion/vpu > platform/chips > platform/intel/pxa > platform/marvell/mcam > platform/mediatek/jpeg/mtk > platform/mediatek/vcodec/decoder/mtk > platform/mediatek/vcodec/encoder/mtk > platform/nuvoton/npcm > platform/nvidia/tegra > platform/nxp/imx > platform/renesas/rcar > platform/renesas/vsp1/vsp1 > platform/rockchip/rkisp1/rkisp1 > platform/samsung/exynos4 > platform/samsung/s5p > platform/st/sti/delta/delta > platform/st/sti/hva/hva > platform/verisilicon/hantro > usb/au0828/au0828 > usb/cx231xx/cx231xx > usb/dvb > usb/em28xx/em28xx > usb/gspca/gspca.c > usb/hackrf/hackrf.c > usb/stk1160/stk1160 > usb/uvc/uvc > > which means we potentially have ~30 drivers which likely don't handle > imported DMABUFs correctly (there is still a chance that DMABUF is > advertised for one queue, while vaddr is used for another). > > I think we have two options: > 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each > driver and add calls to those I actually started on that 9 (!) years ago: https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access If memory serves, the main problem was that there were some drivers where it wasn't clear what should be done. In the end I never continued this work since nobody complained about it. This patch series adds vb2_plane_begin/end_cpu_access() functions, replaces all calls to vb2_plane_vaddr() in drivers to the new functions, and at the end removes vb2_plane_vaddr() altogether. > 2) take a heavy gun approach and just call vb2_begin_cpu_access() > whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access() > whenever vb2_buffer_done() is called (if begin was called before). > > The latter has the disadvantage of drivers not having control over the > timing of the cache sync, so could end up with less than optimal > performance. Also there could be some more complex cases, where the > driver needs to mix DMA and CPU accesses to the buffer, so the fixed > sequence just wouldn't work for them. (But then they just wouldn't > work today either.) > > Hans, Marek, do you have any thoughts? (I'd personally just go with 2 > and if any driver in the future needs something else, they could call > begin/end CPU access manually.) I prefer 1. If nothing else, that makes it easy to identify drivers that do such things. But perhaps a mix is possible: if a VB2 flag is set by the driver, then approach 2 is used. That might help with the drivers where it isn't clear what they should do. Although perhaps this can all be done in the driver itself: instead of vb2_plane_vaddr they call vb2_begin_cpu_access for the whole buffer, and at buffer_done time they call vb2_end_cpu_access. Should work just as well for the very few drivers that need this. Regards, Hans > > [1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq > [2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq > [3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq > > Best, > Tomasz > >> >> But generally speaking, bracketing all driver with CPU access synchronization >> does not make sense indeed, so I second the rejection. >> >> Nicolas >> >>> >>> Best regards, >>> Tomasz >>> >>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c >>>> index 358f1fe42975..4734ff9cf3ce 100644 >>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c >>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c >>>> @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb) >>>> vb->synced = 1; >>>> for (plane = 0; plane < vb->num_planes; ++plane) >>>> call_void_memop(vb, prepare, vb->planes[plane].mem_priv); >>>> + >>>> + if (vb->memory != VB2_MEMORY_DMABUF) >>>> + return; >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; >>>> + >>>> + if (!dbuf) >>>> + continue; >>>> + >>>> + dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir); >>>> + } >>>> } >>>> >>>> /* >>>> @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb) >>>> vb->synced = 0; >>>> for (plane = 0; plane < vb->num_planes; ++plane) >>>> call_void_memop(vb, finish, vb->planes[plane].mem_priv); >>>> + >>>> + if (vb->memory != VB2_MEMORY_DMABUF) >>>> + return; >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; >>>> + >>>> + if (!dbuf) >>>> + continue; >>>> + >>>> + dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir); >>>> + } >>>> } >>>> >>>> /* >>>> -- >>>> 2.43.0-rc1 >>>> >> >