Sakari, ------------------------------------------------------------------------ BRs, Bingbu Cao >-----Original Message----- >From: Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx> >Sent: Thursday, October 3, 2024 7:16 PM >To: Cao, Bingbu <bingbu.cao@xxxxxxxxx> >Cc: linux-media@xxxxxxxxxxxxxxx; Dai, Jianhui J ><jianhui.j.dai@xxxxxxxxx>; tfiga@xxxxxxxxxxxx; >bingbu.cao@xxxxxxxxxxxxxxx >Subject: Re: [PATCH v2] media: intel/ipu6: optimize the IPU6 MMU >mapping and unmapping flow > >Hi Bingbu, > >On Fri, Aug 16, 2024 at 11:31:21AM +0800, bingbu.cao@xxxxxxxxx wrote: >> From: Bingbu Cao <bingbu.cao@xxxxxxxxx> >> >> ipu6_mmu_map() and ipu6_mmu_unmap() operated on a per-page basis, >> leading to frequent calls to spin_locks/unlocks and >> clflush_cache_range for each page. This will cause inefficiencies, >> especially when handling large dma-bufs with hundreds of pages. >> >> This change enhances ipu6_mmu_map()/ipu6_mmu_unmap() with batching >> process multiple contiguous pages. This significantly reduces calls >> for spin_lock/unlock and clflush_cache_range() and improve the >> performance. > >Obtaining spinlocks and flushing the cache for a page should be >rather unnoticeable operations from performance viewpoint in memory >mapping. Some buffers may contain lots of pages if IOMMU did not concentrate the pages. > >The result appears quite a bit more complicated than the original >code. >Do you have data on the benefits of the change in terms of >performance? I don't have the full performance data. From one of Jianhui's tests: The CPU usage went down from 3.7% to 1.7%, the clfush() down from 2.3% to 0.89%. > >The old code was loosely based on arm DMA mapping implementation >AFAIR. DMA mapping is based on that and changed a lot, MMU part is not. > >-- >Kind regards, > >Sakari Ailus