On Mon, Apr 11, 2016 at 02:08:27PM +0100, Mel Gorman wrote: > On Mon, Apr 11, 2016 at 02:26:39PM +0200, Jesper Dangaard Brouer wrote: > > On arch's like PowerPC, the DMA API is the bottleneck. To workaround > > the cost of DMA calls, NIC driver alloc large order (compound) pages. > > (dma_map compound page, handout page-fragments for RX ring, and later > > dma_unmap when last RX page-fragments is seen). > > So, IMO only holding onto the DMA pages is all that is justified but not a > recycle of order-0 pages built on top of the core allocator. For DMA pages, > it would take a bit of legwork but the per-cpu allocator could be split > and converted to hold arbitrary sized pages with a constructer/destructor > to do the DMA coherency step when pages are taken from or handed back to > the core allocator. I'm not volunteering to do that unfortunately but I > estimate it'd be a few days work unless it needs to be per-CPU and NUMA > aware in which case the memory footprint will be high. Have "we" tried to accelerate the DMA calls in PowerPC? For example, it could hold onto a cache of recently used mappings and recycle them if that still works. It trades off a bit of security (a device can continue to DMA after the memory should no longer be accessible to it) for speed, but then so does the per-driver hack of keeping pages around still mapped. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>