haoxin <xhao@xxxxxxxxxxxxxxxxx> writes: > Hi, Huang > > ( 2022/9/21 H2:06, Huang Ying S: >> From: "Huang, Ying" <ying.huang@xxxxxxxxx> >> >> Now, migrate_pages() migrate pages one by one, like the fake code as >> follows, >> >> for each page >> unmap >> flush TLB >> copy >> restore map >> >> If multiple pages are passed to migrate_pages(), there are >> opportunities to batch the TLB flushing and copying. That is, we can >> change the code to something as follows, >> >> for each page >> unmap >> for each page >> flush TLB >> for each page >> copy >> for each page >> restore map >> >> The total number of TLB flushing IPI can be reduced considerably. And >> we may use some hardware accelerator such as DSA to accelerate the >> page copying. >> >> So in this patch, we refactor the migrate_pages() implementation and >> implement the TLB flushing batching. Base on this, hardware >> accelerated page copying can be implemented. >> >> If too many pages are passed to migrate_pages(), in the naive batched >> implementation, we may unmap too many pages at the same time. The >> possibility for a task to wait for the migrated pages to be mapped >> again increases. So the latency may be hurt. To deal with this >> issue, the max number of pages be unmapped in batch is restricted to >> no more than HPAGE_PMD_NR. That is, the influence is at the same >> level of THP migration. >> >> We use the following test to measure the performance impact of the >> patchset, >> >> On a 2-socket Intel server, >> >> - Run pmbench memory accessing benchmark >> >> - Run `migratepages` to migrate pages of pmbench between node 0 and >> node 1 back and forth. >> > As the pmbench can not run on arm64 machine, so i use lmbench instead. > I test case like this: (i am not sure whether it is reasonable, but it seems worked) > ./bw_mem -N10000 10000m rd & > time migratepages pid node0 node1 > > o/patch w/patch > real 0m0.035s real 0m0.024s > user 0m0.000s user 0m0.000s > sys 0m0.035s sys 0m0.024s > > the migratepages time is reduced above 32%. > > But there has a problem, i see the batch flush is called by > migrate_pages_batch > try_to_unmap_flush > arch_tlbbatch_flush(&tlb_ubc->arch); // there batch flush really work. > > But in arm64, the arch_tlbbatch_flush are not supported, becasue it not support CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH yet. > > So, the tlb batch flush means no any flush is did, it is a empty func. Yes. And should_defer_flush() will always return false too. That is, the TLB will still be flushed, but will not be batched. > Maybe this patch can help solve this problem. > https://lore.kernel.org/linux-arm-kernel/20220921084302.43631-1-yangyicong@xxxxxxxxxx/T/ Yes. This will bring TLB flush batching to ARM64. Best Regards, Huang, Ying