Hello, This patch series updates and extends the previously posted "[PATCH 1/7] omapdrm: Fix GEM objects DMA unmapping" series. Compared to v1, the series now includes an extra cleanup (06/10), a cache handling performance improvement (09/10) and a dmabuf refcount fix (10/10). Memory leaks have been reported when allocating a cached omap_bo (with OMAP_BO_CACHED). Investigation showed that this can only come from the DMA mapping debug layer, as on ARM32 the non-coherent, non-IOMMU DMA mapping code doesn't allocate memory to map a page (and the kmemcheck facility is only available on x86, so it can't be a source of memory leaks either on ARM). The DMA debug layer pre-allocates DMA debugging entries and stores them in a list. As the omapdrm driver maps cached buffer page by page, the list of 4096 pre-allocated entries is starved very soon. However, raising the number of DMA mapping debug entries to 32 * 4096 (through the dma_debug_entries kernel command line argument) led to more interesting results. The number of entries being large enough to handle all the pages mapped by kmstest, monitoring the DMA mapping statistics through /sys/kernel/debug/dma-api/ showed that the number of free entries dropped significantly when kmstest was started and didn't raise when it was stopped. In particular, running kmstest without flipping resulting in a drop of 1266 free entries, which corresponds to one 1440x900 framebuffer in XR24. That proved that the pages backing the framebuffer, while freed when the framebuffer was destroyed, were not unmapped. I've thus started investigating the driver GEM implementation. After a few confusing moments that resulted in the 01/10 to 07/10 cleanup patches, I wrote patch 08/10 that fixes the issue. No memory leak is now noticed through /sys/kernel/debug/dma-api/ or /proc/meminfo running 'kmstest --flip' for 300 frames in a loop for one hour with cached buffers. Test patches for kms++ and kmstest to support omapdrm cached mappings are available at git://git.ideasonboard.com/renesas/kmsxx.git omap/cache Patch 09/10 then improves performances by mapping pages for DMA in the correct direction. As the device never writes to memory, DMA_TO_DEVICE can be used instead of DMA_BIDIRECTIONAL. Cache handling is still slow, but I don't think major performance improvements can easily be achieved. Cache handling is all about trade-offs. The current driver implementation tracks dirty pages and cleans the cache for them only. When only a small part of the buffer has been touched this results in a performance gain compared to cleaning the cache for the whole buffer. In particular, when the buffer has not been touched by the CPU at all (which is common when displaying frames captured from a camera device with buffers shared through dma-buf), no cache clean operation is performed. This has been verified using the kmscapture application from kms++ from the above branch. On the other hand, when most of the buffer has been touched, the extra effort to track dirty pages results in a performance loss. However, measurements with perf and ftrace showed that the cost of dirty page tracking is small compared to cache handling. One optimization I tried resulted in a very large performance improvement though. Replacing selective cache cleaning (through the DMA mapping API) by a whole L1+L2 cache clean raised the frame rate from 30 fps to 60 fps when running 'kmstest --flip' on my Pandaboard. However, as that requires calling flush_cache_all(); local_irq_save(flags); outer_flush_all(); local_irq_restore(flags); from the omapdrm driver, there is no way this will be accepted upstream as-is. We could explore the option of adding heuristics to the ARM32 DMA mapping implementation to clean the whole cache when the buffer is large, but that belongs to a separate patch series. To close the patch series, patch 10/10 fixes a GEM object reference count issue related to dma-buf. It turns out that GEM object dma-buf export has suffered from a refcount underflow since kernel v3.5 without anyone noticing. Laurent Pinchart (10): drm: omapdrm: Remove remap argument to omap_gem_get_paddr() drm: omapdrm: Rename occurrences of paddr to dma_addr drm: omapdrm: Rename omap_gem_(get|put)_paddr() to omap_gem_(un)pin() drm: omapdrm: Lower indentation level in omap_gem_dma_sync_buffer() drm: omapdrm: Rename the omap_gem_object addrs field to dma_addrs drm: omapdrm: Rename GEM DMA sync functions drm: omapdrm: Fix incorrect usage of the term 'cache coherency' drm: omapdrm: DMA-unmap pages for all buffer types when freeing buffers drm: omapdrm: Map pages for DMA in DMA_TO_DEVICE direction drm: omapdrm: Take GEM obj and DRM dev references when exporting dmabuf drivers/gpu/drm/omapdrm/omap_drv.h | 13 +- drivers/gpu/drm/omapdrm/omap_fb.c | 27 ++-- drivers/gpu/drm/omapdrm/omap_fbdev.c | 15 +-- drivers/gpu/drm/omapdrm/omap_gem.c | 209 ++++++++++++++++-------------- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 39 +++--- 5 files changed, 163 insertions(+), 140 deletions(-) -- Regards, Laurent Pinchart _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel