Hi Matthew, On 6/15/2024 9:32 AM, Matthew Wilcox wrote: > On Sat, Jun 15, 2024 at 03:45:20AM +0530, Shivank Garg wrote: > > You haven't measured the important thing though -- what's the cost > _to userspace_? When the CPU does the copy, the data is now > cache-hot in that CPU's cache. When the DMA engine does the copy, > it's not cache-hot in any CPU. > > Now, this may not be a big problem. I don't think we do anything to > ensure that the CPU that is going to access the folio in userspace > is the one which does the copy. > > But your methodology is wrong. You're right about importance of measuring the cost to userspace. I initially focused on analyzing the folio_copy overheads within migrate_pages to identify potential optimizations opportunities using DMA hardware accelerators. To address this, I'm planning extend my experiments to measure the cost to userspace specifically related to cache-hotness. This will involve the accessing the migrated pages after the migration process is complete, and measuring the resulting latency to read/write. This approach of DMA-offloading could possibly help in scenarios involving bulk data copying with workload size >> cache capacity or incurs a large shootdown overhead. The userspace cost analysis will provide a more comprehensive picture of page-migration using CPU v/s DMA-offloading. I appreciate your feedback. Shivank