Hey All, So this is another revision of my patch series to performance optimizations to the dma-buf system heap. Unfortunately, in working these up, I realized the heap-helpers infrastructure we tried to add to miniimize code duplication is not as generic as we intended. For some heaps it makes sense to deal with page lists, for other heaps it makes more sense to track things with sgtables. So this series reworks the system heap to use sgtables, and then consolidates the pagelist method from the heap-helpers into the CMA heap. After which the heap-helpers logic is removed (as it is unused). I'd still like to find a better way to avoid some of the logic duplication in implementing the entire dma_buf_ops handlers per heap. But unfortunately that code is tied somewhat to how the buffer's memory is tracked. After this, the series introduces an optimization that Ørjan Eide implemented for ION that avoids calling sync on attachments that don't have a mapping. Next, an optimization to use larger order pages for the system heap. This change brings us closer to the current performance of the ION code. Unfortunately, after submitting the last round, I realized that part of the reason the page-pooling patch I had included was providing such great performance numbers, was because the network page-pool implementation doesn't zero pages that it pulls from the cache. This is very inappropriate for buffers we pass to userland and was what gave it an unfair advantage (almost constant time performance) relative to ION's allocation performance numbers. I added some patches to zero the buffers manually similar to how ION does it, but I found this resulted in basically no performance improvement from the standard page allocator. Thus I've dropped that patch in this series for now. Unfortunately this means we still have a performance delta from the ION system heap as measured by my microbenchmark, and this delta comes from ION system_heap's use of deferred freeing of pages. So less work is done in the measured interval of the microbenchmark. I'll be looking at adding similar code eventually but I don't want to hold the rest of the patches up on this, as it is still a good improvement over the current code. I've updated the chart I shared earlier with current numbers (including with the unsubmitted net pagepool implementation, and with a different unsubmitted pagepool implementation borrowed from ION) here: https://docs.google.com/spreadsheets/d/1-1C8ZQpmkl_0DISkI6z4xelE08MlNAN7oEu34AnO4Ao/edit?usp=sharing I did add to this series a reworked version of my uncached system heap implementation I was submitting a few weeks back. Since it duplicated a lot of the now reworked system heap code, I realized it would be much simpler to add the functionality to the system_heap implementaiton itself. While not improving the core allocation performance, the uncached heap allocations do result in *much* improved performance on HiKey960 as it avoids a lot of flushing and invalidating buffers that the cpu doesn't touch often. Feedback on these would be great! thanks -john New in v3: * Dropped page-pool patches as after correcting the code to zero buffers, they provided no net performance gain. * Added system-uncached implementation ontop of reworked system-heap. * Use the new sgtable mapping functions, in the system and cma code as Suggested-by: Daniel Mentz <danielmentz@xxxxxxxxxx> * Cleanup: Use page_size() rather then open-coding it Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx> Cc: Liam Mark <lmark@xxxxxxxxxxxxxx> Cc: Laura Abbott <labbott@xxxxxxxxxx> Cc: Brian Starkey <Brian.Starkey@xxxxxxx> Cc: Hridya Valsaraju <hridya@xxxxxxxxxx> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> Cc: Sandeep Patil <sspatil@xxxxxxxxxx> Cc: Daniel Mentz <danielmentz@xxxxxxxxxx> Cc: Chris Goldsworthy <cgoldswo@xxxxxxxxxxxxxx> Cc: Ørjan Eide <orjan.eide@xxxxxxx> Cc: Robin Murphy <robin.murphy@xxxxxxx> Cc: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx> Cc: Simon Ser <contact@xxxxxxxxxxx> Cc: James Jones <jajones@xxxxxxxxxx> Cc: linux-media@xxxxxxxxxxxxxxx Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx John Stultz (7): dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists dma-buf: heaps: Move heap-helper logic into the cma_heap implementation dma-buf: heaps: Remove heap-helpers code dma-buf: heaps: Skip sync if not mapped dma-buf: system_heap: Allocate higher order pages if available dma-buf: dma-heap: Keep track of the heap device struct dma-buf: system_heap: Add a system-uncached heap re-using the system heap drivers/dma-buf/dma-heap.c | 33 +- drivers/dma-buf/heaps/Makefile | 1 - drivers/dma-buf/heaps/cma_heap.c | 327 +++++++++++++++--- drivers/dma-buf/heaps/heap-helpers.c | 271 --------------- drivers/dma-buf/heaps/heap-helpers.h | 53 --- drivers/dma-buf/heaps/system_heap.c | 480 ++++++++++++++++++++++++--- include/linux/dma-heap.h | 9 + 7 files changed, 741 insertions(+), 433 deletions(-) delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h -- 2.17.1