Hi Bingbu, On Fri, Aug 16, 2024 at 11:31:21AM +0800, bingbu.cao@xxxxxxxxx wrote: > From: Bingbu Cao <bingbu.cao@xxxxxxxxx> > > ipu6_mmu_map() and ipu6_mmu_unmap() operated on a per-page basis, > leading to frequent calls to spin_locks/unlocks and > clflush_cache_range for each page. This will cause inefficiencies, > especially when handling large dma-bufs with hundreds of pages. > > This change enhances ipu6_mmu_map()/ipu6_mmu_unmap() with batching > process multiple contiguous pages. This significantly reduces calls > for spin_lock/unlock and clflush_cache_range() and improve the > performance. Obtaining spinlocks and flushing the cache for a page should be rather unnoticeable operations from performance viewpoint in memory mapping. The result appears quite a bit more complicated than the original code. Do you have data on the benefits of the change in terms of performance? The old code was loosely based on arm DMA mapping implementation AFAIR. -- Kind regards, Sakari Ailus