Am Donnerstag, den 20.06.2013, 20:15 +0900 schrieb Inki Dae: [...] > > > > You already need some kind of IPC between the two tasks, as I suspect > > > > even in your example it wouldn't make much sense to queue the buffer > > > > over and over again in task B without task A writing anything to it. > > So > > > > task A has to signal task B there is new data in the buffer to be > > > > processed. > > > > > > > > There is no need to share the buffer over and over again just to get > > the > > > > two processes to work together on the same thing. Just share the fd > > > > between both and then do out-of-band completion signaling, as you need > > > > this anyway. Without this you'll end up with unpredictable behavior. > > > > Just because sync allows you to access the buffer doesn't mean it's > > > > valid for your use-case. Without completion signaling you could easily > > > > end up overwriting your data from task A multiple times before task B > > > > even tries to lock the buffer for processing. > > > > > > > > So the valid flow is (and this already works with the current APIs): > > > > Task A Task B > > > > ------ ------ > > > > CPU access buffer > > > > ----------completion signal---------> > > > > qbuf (dragging buffer into > > > > device domain, flush caches, > > > > reserve buffer etc.) > > > > | > > > > wait for device operation to > > > > complete > > > > | > > > > dqbuf (dragging buffer back > > > > into CPU domain, invalidate > > > > caches, unreserve) > > > > <---------completion signal------------ > > > > CPU access buffer > > > > > > > > > > Correct. In case that data flow goes from A to B, it needs some kind > > > of IPC between the two tasks every time as you said. Then, without > > > dmabuf-sync, how do think about the case that two tasks share the same > > > buffer but these tasks access the buffer(buf1) as write, and data of > > > the buffer(buf1) isn't needed to be shared? > > > > > Sorry, I don't see the point you are trying to solve here. If you share > > a buffer and want its content to be clearly defined at every point in > > time you have to synchronize the tasks working with the buffer, not just > > the buffer accesses itself. > > > > Easiest way to do so is doing sync through userspace with out-of-band > > IPC, like in the example above. > > In my opinion, that's not definitely easiest way. What I try to do is > to avoid using *the out-of-band IPC*. As I mentioned in document file, > the conventional mechanism not only makes user application > complicated-user process needs to understand how the device driver is > worked-but also may incur performance overhead by using the > out-of-band IPC. The above my example may not be enough to you but > there would be other cases able to use my approach efficiently. > Yeah, you'll some knowledge and understanding about the API you are working with to get things right. But I think it's not an unreasonable thing to expect the programmer working directly with kernel interfaces to read up on how things work. Second thing: I'll rather have *one* consistent API for every subsystem, even if they differ from each other than having to implement this syncpoint thing in every subsystem. Remember: a single execbuf in DRM might reference both GEM objects backed by dma-buf as well native SHM or CMA backed objects. The dma-buf-mgr proposal already allows you to handle dma-bufs much the same way during validation than native GEM objects. And to get back to my original point: if you have more than one task operating together on a buffer you absolutely need some kind of real IPC to sync them up and do something useful. Both you syncpoints and the proposed dma-fences only protect the buffer accesses to make sure different task don't stomp on each other. There is nothing in there to make sure that the output of your pipeline is valid. You have to take care of that yourself in userspace. I'll reuse your example to make it clear what I mean: Task A Task B ------ ------- dma_buf_sync_lock(buf1) CPU write buf1 dma_buf_sync_unlock(buf1) ---------schedule Task A again------- dma_buf_sync_lock(buf1) CPU write buf1 dma_buf_sync_unlock(buf1) ---------schedule Task B--------- qbuf(buf1) dma_buf_sync_lock(buf1) .... This is what can happen if you don't take care of proper syncing. Task A writes something to the buffer in expectation that Task B will take care of it, but before Task B even gets scheduled Task A overwrites the buffer again. Not what you wanted, isn't it? So to make sure the output of a pipeline of some kind is what you expect you have to do syncing with IPC. And once you do CPU access it is a synchronous thing in the stream of events. I see that you might want to have some kind of bracketed CPU access even for the fallback mmap case for things like V4L2 that don't provide explicit sync by their own, but in no way I can see why we would need a user/kernel shared syncpoint for this. > > A more advanced way to achieve this > > would be using cross-device fences to avoid going through userspace for > > every syncpoint. > > > > Ok, maybe there is something I missed. So question. What is the > cross-device fences? dma fence?. And how we can achieve the > synchronization mechanism without going through user space for every > syncpoint; CPU and DMA share a same buffer?. And could you explain it > in detail as long as possible like I did? > Yeah I'm talking about the proposed dma-fences. They would allow you to just queue things into the kernel without waiting for a device operation to finish. But you still have to make sure that your commands have the right order and don't go wild. So for example you could do something like this: Userspace Kernel --------- ------ 1. build DRM command stream rendering into buf1 2. queue command stream with execbuf 1. validate command stream 1.1 reference buf1 for writing through dma-buf-mgr 2. kick off GPU processing 3. qbuf buf1 into V4L2 3. reference buf1 for reading 3.1 wait for fence from GPU to signal 4. kick off V4L2 processing So you don't need to wait in userspace and potentially avoid some context switches, but you still have to make sure that GPU commands are queued before you queue the V4L2 operation to make sure things get operated on in the right order. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html