Hesham Almatary <hesham.almatary@xxxxxxxxxx> writes: > On 9/27/2022 12:21 PM, haoxin wrote: >> Hi, Huang >> >> ( 2022/9/21 H2:06, Huang Ying S: >>> From: "Huang, Ying" <ying.huang@xxxxxxxxx> >>> >>> Now, migrate_pages() migrate pages one by one, like the fake code as >>> follows, >>> >>> for each page >>> unmap >>> flush TLB >>> copy >>> restore map >>> >>> If multiple pages are passed to migrate_pages(), there are >>> opportunities to batch the TLB flushing and copying. That is, we can >>> change the code to something as follows, >>> >>> for each page >>> unmap >>> for each page >>> flush TLB >>> for each page >>> copy >>> for each page >>> restore map >>> >>> The total number of TLB flushing IPI can be reduced considerably. And >>> we may use some hardware accelerator such as DSA to accelerate the >>> page copying. >>> >>> So in this patch, we refactor the migrate_pages() implementation and >>> implement the TLB flushing batching. Base on this, hardware >>> accelerated page copying can be implemented. >>> >>> If too many pages are passed to migrate_pages(), in the naive batched >>> implementation, we may unmap too many pages at the same time. The >>> possibility for a task to wait for the migrated pages to be mapped >>> again increases. So the latency may be hurt. To deal with this >>> issue, the max number of pages be unmapped in batch is restricted to >>> no more than HPAGE_PMD_NR. That is, the influence is at the same >>> level of THP migration. >>> >>> We use the following test to measure the performance impact of the >>> patchset, >>> >>> On a 2-socket Intel server, >>> >>> - Run pmbench memory accessing benchmark >>> >>> - Run `migratepages` to migrate pages of pmbench between node 0 and >>> node 1 back and forth. >>> >> As the pmbench can not run on arm64 machine, so i use lmbench instead. >> I test case like this: (i am not sure whether it is reasonable, >> but it seems worked) >> ./bw_mem -N10000 10000m rd & >> time migratepages pid node0 node1 >> > FYI, I have ported pmbench to AArch64 [1]. The project seems to be > abandoned on bitbucket, > > I wonder if it makes sense to fork it elsewhere and push the pending PRs there. > > > [1] https://bitbucket.org/jisooy/pmbench/pull-requests/5 Maybe try to contact the original author with email firstly? Best Regards, Huang, Ying >> o/patch w/patch >> real 0m0.035s real 0m0.024s >> user 0m0.000s user 0m0.000s >> sys 0m0.035s sys 0m0.024s >> >> the migratepages time is reduced above 32%. >> >> But there has a problem, i see the batch flush is called by >> migrate_pages_batch >> try_to_unmap_flush >> arch_tlbbatch_flush(&tlb_ubc->arch); // there batch flush really work. >> >> But in arm64, the arch_tlbbatch_flush are not supported, becasue it >> not support CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH yet. >> >> So, the tlb batch flush means no any flush is did, it is a empty func. >> >> Maybe this patch can help solve this problem. >> https://lore.kernel.org/linux-arm-kernel/20220921084302.43631-1-yangyicong@xxxxxxxxxx/T/ >> >> >> >> >> >>