haoxin <xhao@xxxxxxxxxxxxxxxxx> writes: > Hi Huang, > > This is an exciting change, but on ARM64 machine the TLB > flushing are not through IPI, it depends on 'vale1is' > > instructionso I'm wondering if there's also a benefit on arm64, > and I'm going to test it on an ARM64 machine. We have no arm64 machine to test and I know very little about arm64. Thanks for information and testing. Best Regards, Huang, Ying > > ( 2022/9/21 H11:47, Zi Yan S: >> On 21 Sep 2022, at 2:06, Huang Ying wrote: >> >>> From: "Huang, Ying" <ying.huang@xxxxxxxxx> >>> >>> Now, migrate_pages() migrate pages one by one, like the fake code as >>> follows, >>> >>> for each page >>> unmap >>> flush TLB >>> copy >>> restore map >>> >>> If multiple pages are passed to migrate_pages(), there are >>> opportunities to batch the TLB flushing and copying. That is, we can >>> change the code to something as follows, >>> >>> for each page >>> unmap >>> for each page >>> flush TLB >>> for each page >>> copy >>> for each page >>> restore map >>> >>> The total number of TLB flushing IPI can be reduced considerably. And >>> we may use some hardware accelerator such as DSA to accelerate the >>> page copying. >>> >>> So in this patch, we refactor the migrate_pages() implementation and >>> implement the TLB flushing batching. Base on this, hardware >>> accelerated page copying can be implemented. >>> >>> If too many pages are passed to migrate_pages(), in the naive batched >>> implementation, we may unmap too many pages at the same time. The >>> possibility for a task to wait for the migrated pages to be mapped >>> again increases. So the latency may be hurt. To deal with this >>> issue, the max number of pages be unmapped in batch is restricted to >>> no more than HPAGE_PMD_NR. That is, the influence is at the same >>> level of THP migration. >>> >>> We use the following test to measure the performance impact of the >>> patchset, >>> >>> On a 2-socket Intel server, >>> >>> - Run pmbench memory accessing benchmark >>> >>> - Run `migratepages` to migrate pages of pmbench between node 0 and >>> node 1 back and forth. >>> >>> With the patch, the TLB flushing IPI reduces 99.1% during the test and >>> the number of pages migrated successfully per second increases 291.7%. >> Thank you for the patchset. Batching page migration will definitely >> improve its throughput from my past experiments[1] and starting with >> TLB flushing is a good first step. >> >> BTW, what is the rationality behind the increased page migration >> success rate per second? >> >>> This patchset is based on v6.0-rc5 and the following patchset, >>> >>> [PATCH -V3 0/8] migrate_pages(): fix several bugs in error path >>> https://lore.kernel.org/lkml/20220817081408.513338-1-ying.huang@xxxxxxxxx/ >>> >>> The migrate_pages() related code is converting to folio now. So this >>> patchset cannot apply recent akpm/mm-unstable branch. This patchset >>> is used to check the basic idea. If it is OK, I will rebase the >>> patchset on top of folio changes. >>> >>> Best Regards, >>> Huang, Ying >> >> [1] https://lwn.net/Articles/784925/ >> >> -- >> Best Regards, >> Yan, Zi