On Wed, Oct 31, 2018 at 12:00:12AM +0200, Laurent Pinchart wrote: > As discussed before, we're clearly missing a proper non-coherent memory > allocation API. As much as I would like to see a volunteer for this, I don't > think it's a reason to block the performance improvement we get from this > patch. > > This being said, I'm a bit concerned about the allocation of 16kB blocks from > kmalloc(), and believe that the priority of the non-coherent memory allocation > API implementation should be increased. Christoph, you mentioned in a recent > discussion on this topic that you are working on removing the existing non- > coherent DMA allocation API, what is your opinion on how we should gllobally > solve the problem that this patch addresses ? I hope to address this on the dma-mapping side for this merge window. My current idea is to add (back) add dma_alloc_noncoherent-like API (name to be determindes). This would be very similar to to the DMA_ATTR_NON_CONSISTENT to dma_alloc_attrs with the following differences: - it must actually be implemented by every dma_map_ops instance, no falling back to dma_alloc_coherent like semantics. For all actually coherent ops this is trivial as there is no difference in semantics and we can fall back to the 'coherent' semantics, for non-coherent direct mappings it also is mostly trivial as we generally can use dma_direct_alloc. The only instances that will need real work are IOMMUs that support non-coherent access, but there is only about a handfull of those. - instead of using the only vaguely defined dma_cache_sync for ownership transfers we'll use dma_sync_single_* which are well defined and available everywhere I'll try to prioritise this to get done early in the merge window, but I'll need someone else do the USB side. > > + dma_sync_single_for_cpu(&urb->dev->dev, > > + urb->transfer_dma, > > + urb->transfer_buffer_length, > > + DMA_FROM_DEVICE); > > + > > As explained before as well, I think we need dma_sync_single_for_device() > calls, and I know they would degrade performances until we fix the problem on > the DMA mapping API side. This is not a reason to block the patch either. I > would appreciate, however, if a comment could be added to the place where > dma_sync_single_for_device() should be called, to explain the problem. Yes, as a rule of thumb every dma_sync_single_for_cpu call needs to pair with a previous dma_sync_single_for_device call. (Exceptions like selective use of DMA_ATTR_SKIP_CPU_SYNC proove the rule)