On 07/20/16 09:26, Robert Jarzmik wrote: > Peter Ujfalusi <peter.ujfalusi@xxxxxx> writes: > >> On 07/18/16 13:34, Russell King - ARM Linux wrote: >>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: >>>> Before looking for the next descriptor to start, complete the just finished >>>> cookie. >>> >>> This change will reduce performance as we no longer have an overlap >>> between the next request starting to be dealt with in the hardware >>> vs the previous request being completed. >> >> vchan_cookie_complete() will only mark the cookie completed, adds the vd to >> the desc_completed list (it was deleted from desc_issued list when it was >> started by omap_dma_start_desc) and schedule the tasklet to deal with the real >> completion later. >> Marking the just finished descriptor/cookie done first then looking for >> possible descriptors in the queue to start feels like a better sequence. >> >> After a quick grep in the kernel source: only omap-dma.c was starting the next >> transfer before marking the current completed descriptor/cookie done. > > Euh actually I think it's done in other drivers as well : > - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining) > - drivers/dma/pxa_dma.c > => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which > will mark the completion while the next transfer is already pumped by the > hardware. The 'hot-chaining' is a bit different then what omap-dma is doing. If I got it right. When the DMA is running and a new request comes the driver will append the new transfer to the list used by the HW. This way there will be no stop and restart needed, the DMA is running w/o interruption. > Speaking of which, from a purely design point of view, as long as you think > beforehand what is your sequence, ie. what is the sequence of your link > chaining, completion handling, etc ..., both marking before or after next tx > start should be fine IMHO. Yes, it might be a bit better from performance point of view if we first start the pending descriptor (if there is one) then do the vchan_cookie_complete(). On the other hand if we care more about latency and accuracy we should complete the transfer first then look for pending descriptors. But since virt_dma is using a tasklet for the real completion, the latency is always going to be when the tasklet is given the chance to execute. > So in your quest for the "better sequence" the pxa driver's one might give you > some perspective :) I did thought about similar 'hot-chaining' for TI's eDMA and sDMA. Especially eDMA would benefit from it, but so far I see too many race conditions to overcome to be brave enough to write something to test it. and I don't have time for it atm ;) -- Péter -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html