Hi Javier, On Tue, Aug 16, 2016 at 05:26:31PM -0400, Javier Martinez Canillas wrote: > Hello Sakari, > > On 08/16/2016 05:13 PM, Sakari Ailus wrote: > > Hi Javier, > > > > Javier Martinez Canillas wrote: > >> Hello Sakari, > >> > >> On 08/16/2016 04:47 PM, Sakari Ailus wrote: > >>> Hi Javier, > >>> > >>> Javier Martinez Canillas wrote: > >>>> Hello Hans, > >>>> > >>>> Thanks a lot for your feedback. > >>>> > >>>> On 08/13/2016 09:47 AM, Hans Verkuil wrote: > >>>>> On 07/20/2016 08:22 PM, Javier Martinez Canillas wrote: > >>>>>> Currently the dma-buf is unmapped when the buffer is dequeued by userspace > >>>>>> but it's not used anymore after the driver finished processing the buffer. > >>>>>> > >>>>>> So instead of doing the dma-buf unmapping in __vb2_dqbuf(), it can be made > >>>>>> in vb2_buffer_done() after the driver notified that buf processing is done. > >>>>>> > >>>>>> Decoupling the buffer dequeue from the dma-buf unmapping has also the side > >>>>>> effect of making possible to add dma-buf fence support in the future since > >>>>>> the buffer could be dequeued even before the driver has finished using it. > >>>>>> > >>>>>> Signed-off-by: Javier Martinez Canillas <javier@xxxxxxxxxxxxxxx> > >>>>>> > >>>>>> --- > >>>>>> Hello, > >>>>>> > >>>>>> I've tested this patch doing DMA buffer sharing between a > >>>>>> vivid input and output device with both v4l2-ctl and gst: > >>>>>> > >>>>>> $ v4l2-ctl -d0 -e1 --stream-dmabuf --stream-out-mmap > >>>>>> $ v4l2-ctl -d0 -e1 --stream-mmap --stream-out-dmabuf > >>>>>> $ gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf ! v4l2sink device=/dev/video1 io-mode=dmabuf-import > >>>>>> > >>>>>> And I didn't find any issues but more testing will be appreciated. > >>>>>> > >>>>>> Best regards, > >>>>>> Javier > >>>>>> > >>>>>> drivers/media/v4l2-core/videobuf2-core.c | 34 +++++++++++++++++++++----------- > >>>>>> 1 file changed, 22 insertions(+), 12 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c > >>>>>> index 7128b09810be..973331efaf79 100644 > >>>>>> --- a/drivers/media/v4l2-core/videobuf2-core.c > >>>>>> +++ b/drivers/media/v4l2-core/videobuf2-core.c > >>>>>> @@ -958,6 +958,22 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no) > >>>>>> EXPORT_SYMBOL_GPL(vb2_plane_cookie); > >>>>>> > >>>>>> /** > >>>>>> + * __vb2_unmap_dmabuf() - unmap dma-buf attached to buffer planes > >>>>>> + */ > >>>>>> +static void __vb2_unmap_dmabuf(struct vb2_buffer *vb) > >>>>>> +{ > >>>>>> + int i; > >>>>>> + > >>>>>> + for (i = 0; i < vb->num_planes; ++i) { > >>>>>> + if (!vb->planes[i].dbuf_mapped) > >>>>>> + continue; > >>>>>> + call_void_memop(vb, unmap_dmabuf, > >>>>>> + vb->planes[i].mem_priv); > >>>>> > >>>>> Does unmap_dmabuf work in interrupt context? Since vb2_buffer_done can be called from > >>>>> an irq handler this is a concern. > >>>>> > >>>> > >>>> Good point, I believe it shouldn't be called from atomic context since both > >>>> the dma_buf_vunmap() and dma_buf_unmap_attachment() functions can sleep. > >>>> > >>>>> That said, vb2_buffer_done already calls call_void_memop(vb, finish, vb->planes[plane].mem_priv); > >>>>> to sync buffers, and that can take a long time as well. So it is not a good idea to > >>>>> have this in vb2_buffer_done. > >>>>> > >>>> > >>>> I see. > >>>> > >>>>> What I would like to see is to have vb2 handle this finish() call and the vb2_unmap_dmabuf > >>>>> in some workthread or equivalent. > >>>>> > >>>>> It would complicate matters somewhat in vb2, but it would simplify drivers since these > >>>>> actions would not longer take place in interrupt context. > >>>>> > >>>>> I think this patch makes sense, but I would prefer that this is moved out of the interrupt > >>>>> context. > >>>>> > >>>> > >>>> Ok, I can take a look to this and handle the finish() and unmap_dmabuf() > >>>> out of interrupt context as you suggested. > >>> > >>> I have a patch doing the former which is a part of my cache management > >>> fix patchset: > >>> > >>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=b57f937627abda158ada01a3297dbb0f0a57b515> > >>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=shortlog;h=refs/heads/vb2-dc-noncoherent> > >>> > >> > >> Interesting, thanks for the links. > >> > >>> There were a few drivers doing nasty things with memory that I couldn't > >>> quite fix back then. Just FYI. > >>> > >> > >> Did you mean that there were issues with moving finish mem op call to DQBUF? > >> > >> Do you recall what these drivers were or what were doing that caused problems? > > > > Not any particular drivers --- the problem is that flushing the cache > > Ah, you were explaining the rationale of the change, not that you had > issues after doing the mentioned change, sorry for my confusion. > > > simply takes a lot of time, often milliseconds depending on the machine. > > There's also no reason to do it in interrupt context. It kills realtime > > performance, too. > > > > Yes, I understand why calling finish() in vb2_buffer_done() is bad :) > > >> > >> In any case, what Hans proposed AFAIU is not to change when the finish call > >> happens but to split the vb2_buffer_done() function and defer part of it to > >> a workqueue or kthread. I'll give a try to that approach probably tomorrow. > > > > There's also the context of the user space process calling DQBUF, too. > > Why not to use that one instead? > > > > The idea of $SUBJECT was to do the operations as soon as possible instead of > waiting for these to happen when user-space calls DQBUF. There isn't a reason > to wait until DQBUF to call finish() AFAICT, besides making vb2 more simpler > of course (and this is also true for dma-buf unmap, it can be done sooner). My apologies for the late reply... I intended to reply earlier but then forgot about it. :-i Presumably, if the user space is interested in low latency, it is ready to call VIDIOC_DQBUF IOCTL either based on poll(2) results or it is sleeping in the IOCTL. Adding two extra context switchest to this sequence does not improve it -- quite the opposite. > > The reason why I posted this patch is that I'm exploring the possibility to > add dma-buf implicit fence support to vb2, and one option is to use the fence > as sync point instead of requiring user-space to call DQBUF and QBUF. So in > that case, it would be good for the finish and dma-buf unmap operations to > happen as soon as the driver is done processing the buffer. This is an interesting use case. If the buffer is passed between different hardware blocks, is there a need to flush the cache? I have to admit I haven't been following up the dma-buf fence development. :-I I wonder if this could be done based on whether fences are being used or not --- if there's a need to. There are also cases when the hardware devices would be able to share buffers without help from software. -- Kind regards, Sakari Ailus e-mail: sakari.ailus@xxxxxx XMPP: sailus@xxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html