> > - Previously, only USB transfers to unaddressable memory needed > > to go through the bounce buffer, now all of them do, which may > > impact runtime performance for USB endpoints that do a lot of > > transfers. > > > > On the upside, the local_mem support uses write-combining buffers, > > which should be a bit faster for transfers to the device compared to > > normal uncached coherent memory as used in dmabounce. > > > > Talking from past experience using this trick on a NXP ARM9 SoC ~10 > years ago, using on-chip SRAM for USB DMA likely results in a > significant performance boost, even without write combining, although > the exact scenario obviously matters. Right, that makes sense, but it won't help here because there is no SRAM. One detail I noticed is that the localmem pool normally gets mapped as WC, which is what I did in the new code as well, but dma_alloc_flags(..., DMA_ATTR_WRITE_COMBINE) does not always honor this flag. I think it will do it here because a GFP_KERNEL allocation should be served by the remap_allocator, while GFP_ATOMIC allocations would be served by pool_allocator_alloc(), which ignores the flag. Arnd