Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2013/6/21 Lucas Stach <l.stach@xxxxxxxxxxxxxx>
Am Donnerstag, den 20.06.2013, 20:15 +0900 schrieb Inki Dae:
[...]
> > > > You already need some kind of IPC between the two tasks, as I suspect
> > > > even in your example it wouldn't make much sense to queue the buffer
> > > > over and over again in task B without task A writing anything to it.
> > So
> > > > task A has to signal task B there is new data in the buffer to be
> > > > processed.
> > > >
> > > > There is no need to share the buffer over and over again just to get
> > the
> > > > two processes to work together on the same thing. Just share the fd
> > > > between both and then do out-of-band completion signaling, as you need
> > > > this anyway. Without this you'll end up with unpredictable behavior.
> > > > Just because sync allows you to access the buffer doesn't mean it's
> > > > valid for your use-case. Without completion signaling you could easily
> > > > end up overwriting your data from task A multiple times before task B
> > > > even tries to lock the buffer for processing.
> > > >
> > > > So the valid flow is (and this already works with the current APIs):
> > > > Task A                                    Task B
> > > > ------                                    ------
> > > > CPU access buffer
> > > >          ----------completion signal--------->
> > > >                                           qbuf (dragging buffer into
> > > >                                           device domain, flush caches,
> > > >                                           reserve buffer etc.)
> > > >                                                     |
> > > >                                           wait for device operation to
> > > >                                           complete
> > > >                                                     |
> > > >                                           dqbuf (dragging buffer back
> > > >                                           into CPU domain, invalidate
> > > >                                           caches, unreserve)
> > > >         <---------completion signal------------
> > > > CPU access buffer
> > > >
> > >
> > > Correct. In case that data flow goes from A to B, it needs some kind
> > > of IPC between the two tasks every time as you said. Then, without
> > > dmabuf-sync, how do think about the case that two tasks share the same
> > > buffer but these tasks access the buffer(buf1) as write, and data of
> > > the buffer(buf1) isn't needed to be shared?
> > >
> > Sorry, I don't see the point you are trying to solve here. If you share
> > a buffer and want its content to be clearly defined at every point in
> > time you have to synchronize the tasks working with the buffer, not just
> > the buffer accesses itself.
> >
> > Easiest way to do so is doing sync through userspace with out-of-band
> > IPC, like in the example above.
>
> In my opinion, that's not definitely easiest way. What I try to do is
> to avoid using *the out-of-band IPC*. As I mentioned in document file,
> the conventional mechanism not only makes user application
> complicated-user process needs to understand how the device driver is
> worked-but also may incur performance overhead by using the
> out-of-band IPC. The above my example may not be enough to you but
> there would be other cases able to use my approach efficiently.
>

Yeah, you'll some knowledge and understanding about the API you are
working with to get things right. But I think it's not an unreasonable
thing to expect the programmer working directly with kernel interfaces
to read up on how things work.

Second thing: I'll rather have *one* consistent API for every subsystem,
even if they differ from each other than having to implement this
syncpoint thing in every subsystem. Remember: a single execbuf in DRM
might reference both GEM objects backed by dma-buf as well native SHM or
CMA backed objects. The dma-buf-mgr proposal already allows you to
handle dma-bufs much the same way during validation than native GEM
objects.
 
Actually, at first I had implemented a fence helper framework based on reservation and dma fence to provide easy-use-interface for device drivers. However, that was wrong implemention: I had not only customized the dma fence but also not considered dead lock issue. After that, I have reimplemented it as dmabuf sync to solve dead issue, and at that time, I realized that we first need to concentrate on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and DMA can access a same buffer, And the fact simple is the best, and the fact we need not only kernel side but also user side interfaces. After that, I collected what is the common part for all subsystems, and I have devised this dmabuf sync framework for it. I'm not really specialist in Desktop world. So question. isn't the execbuf used only for the GPU? the gpu has dedicated video memory(VRAM) so it needs migration mechanism between system memory and the dedicated video memory, and also to consider ordering issue while be migrated.
 

And to get back to my original point: if you have more than one task
operating together on a buffer you absolutely need some kind of real IPC
to sync them up and do something useful. Both you syncpoints and the
proposed dma-fences only protect the buffer accesses to make sure
different task don't stomp on each other. There is nothing in there to
make sure that the output of your pipeline is valid. You have to take
care of that yourself in userspace. I'll reuse your example to make it
clear what I mean:

Task A                                         Task B
------                                         -------
dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
          ---------schedule Task A again-------
dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
            ---------schedule Task B---------
                                               qbuf(buf1)
                                                  dma_buf_sync_lock(buf1)
                                               ....

This is what can happen if you don't take care of proper syncing. Task A
writes something to the buffer in expectation that Task B will take care
of it, but before Task B even gets scheduled Task A overwrites the
buffer again. Not what you wanted, isn't it?
 
Exactly wrong example. I had already mentioned about that. "In case that data flow goes from A to B, it needs some kind of IPC between the two tasks every time"  So again, your example would have no any problem in case that *two tasks share the same buffer but these tasks access the buffer(buf1) as write, and data of the buffer(buf1) isn't needed to be shared*.  They just need to use the buffer as *storage*. So All they want is to avoid stomping on the buffer in this case.
 

So to make sure the output of a pipeline of some kind is what you expect
you have to do syncing with IPC
 
So not true.
 
. And once you do CPU access it is a
synchronous thing in the stream of events. I see that you might want to
have some kind of bracketed CPU access even for the fallback mmap case
for things like V4L2 that don't provide explicit sync by their own, but
in no way I can see why we would need a user/kernel shared syncpoint for
this.

> > A more advanced way to achieve this
> > would be using cross-device fences to avoid going through userspace for
> > every syncpoint.
> >
>
> Ok, maybe there is something I missed. So question. What is the
> cross-device fences? dma fence?. And how we can achieve the
> synchronization mechanism without going through user space for every
> syncpoint; CPU and DMA share a same buffer?. And could you explain it
> in detail as long as possible like I did?
>
Yeah I'm talking about the proposed dma-fences. They would allow you to
just queue things into the kernel without waiting for a device operation
to finish. But you still have to make sure that your commands have the
right order and don't go wild. So for example you could do something
like this:

Userspace                                   Kernel
---------                                   ------
1. build DRM command stream
rendering into buf1
2. queue command stream with execbuf
                                            1. validate command stream
                                             1.1 reference buf1 for writing
                                                 through dma-buf-mgr
                                            2. kick off GPU processing
3. qbuf buf1 into V4L2
                                            3. reference buf1 for reading
                                             3.1 wait for fence from GPU to
                                                 signal
                                            4. kick off V4L2 processing

 
That seems like very specific to Desktop GPU. isn't it?
 
So you don't need to wait in userspace and potentially avoid some
context switches,
 
Also not true.
 
but you still have to make sure that GPU commands are
queued before you queue the V4L2 operation to make sure things get
operated on in the right order.

 
 
 
I'd like to say you that my approach is not perfact so may definietly have many problems and addition works - actually, I found some problems and are solving them, in addition, the implememtation to generic user side interfacing mechanism is in progress for the destination - , and thanks to your comments. However, I think we can try to do for more better something. Lastly, I'll look forward to keeping up your good advices.
 
Thanks,
Inki Dae
 
 
Regards,
Lucas

--
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux