Hello, Here are the notes I took during the session. As I was taking part in the discussion I'm sure I have missed a few points, so please add any missing information or correct any mistake I would have made. - Allocation of TX async descriptors It would be useful to decouple allocation from prepare, as prepare needs to be called from interrupt context, which requires allocation from interrupt pool. The room reached a consensus that this a real problem that should be addressed by a new API. This is also related to runtime PM that is currently painful with the existing DMA engine API. (TBD: how ?) - Error reporting UART implementation that doesn't provide interrupt on timeout. The DMA engine provides that interrupt, and we need to get the residue at timeout time. This is solved by commit f067025bc676ba8d18fba5f959598339e39b86db Author: Dave Jiang <dave.jiang@xxxxxxxxx> Date: Wed Jul 20 13:13:50 2016 -0700 dmaengine: add support to provide error result from a DMA transation The commit also supports DMA engine based network or USB devices, which don't know in advance how much data will be transferred in the receive direction. In that case the slave driver has to prepare a large-enough transaction, and the completion handler reports the actual packet size through the residue (assuming the hardware supports early termination of transactions when packet reception completes). In those use cases, we don't know in advance how much data will be received. - Per-channel capabilities Most DMA engines have identical interchangeable channels. The API has been designed around that assumption. However, some DMA engines have different capabilities per channel. This isn't exposed by the current API. One possible solution would be to register multiple DMA engines, each with a set of identical channels. The capabilities API (dma_get_slave_caps) is channel-based, so it could also be refactored in the core to expose per-channel capabilities. - API refactoring The transaction prepare API is possible too high-level in the sense that it combines multiple concepts (e.g. cyclic + physically contiguous, non-cyclic + sglist based, ...). Many DMA engines could support other combinations. We could use a lower-level API that doesn't combine those concepts, thus enabling more use cases. We would need to handle more in core code. We currently leave the whole implementation to drivers, which leads to slightly different behaviours between drivers. As recent DMA engines tend to use small segments and chain them, the framework could handle splitting of transactions in small segments, and ask drivers to allocate those segments and chain them. When CPU-based chaining is needed the framework could implement helper functions for that purpose. This would lead to a more coherent behaviour. With an API where drivers only handle small segments, residue calculation could be handled mostly by the core, with a single driver operation to retrieve the DMA pointer value for the current segment. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html