On Thu, Nov 09, 2023 at 01:20:29PM +0800, Huang, Ying wrote: > Byungchul Park <byungchul@xxxxxx> writes: > > > Hi everyone, > > > > While I'm working with CXL memory, I have been facing migration overhead > > esp. TLB shootdown on promotion or demotion between different tiers. > > Yeah.. most TLB shootdowns on migration through hinting fault can be > > avoided thanks to Huang Ying's work, commit 4d4b6d66db ("mm,unmap: avoid > > flushing TLB in batch if PTE is inaccessible"). > > > > However, it's only for ones using hinting fault. I thought it'd be much > > better if we have a general mechanism to reduce # of TLB flushes and > > TLB misses, that we can apply to any type of migration. I tried it only > > for tiering migration for now tho. > > > > I'm suggesting a mechanism to reduce TLB flushes by keeping source and > > destination of folios participated in the migrations until all TLB > > flushes required are done, only if those folios are not mapped with > > write permission PTE entries at all. I worked Based on v6.6-rc5. > > > > Can you believe it? I saw the number of TLB full flush reduced about > > 80% and iTLB miss reduced about 50%, and the time wise performance > > always shows at least 1% stable improvement with the workload I tested > > with, XSBench. However, I believe that it would help more with other > > ones or any real ones. It'd be appreciated to let me know if I'm missing > > something. > > Can you help to test the effect of commit 7e12beb8ca2a ("migrate_pages: > batch flushing TLB") for your test case? To test it, you can revert it > and compare the performance before and after the reverting. I will. > And, how do you trigger migration when testing XSBench? Use a tiered > memory system, and migrate pages between DRAM and CXL memory back and > forth? If so, how many pages will you migrate for each migration? Honestly I've been focusing on the migration # and TLB #. I will get back to you. Byungchul > -- > Best Regards, > Huang, Ying > > > > > Byungchul > > > > --- > > > > Changes from v3: > > > > 1. Don't use the kconfig, CONFIG_MIGRC, and remove sysctl knob, > > migrc_enable. (feedbacked by Nadav) > > 2. Remove the optimization skipping CPUs that have already > > performed TLB flushes needed by any reason when performing > > TLB flushes by migrc because I can't tell the performance > > difference between w/ the optimization and w/o that. > > (feedbacked by Nadav) > > 3. Minimize arch-specific code. While at it, move all the migrc > > declarations and inline functions from include/linux/mm.h to > > mm/internal.h (feedbacked by Dave Hansen, Nadav) > > 4. Separate a part making migrc paused when the system is in > > high memory pressure to another patch. (feedbacked by Nadav) > > 5. Rename: > > a. arch_tlbbatch_clean() to arch_tlbbatch_clear(), > > b. tlb_ubc_nowr to tlb_ubc_ro, > > c. migrc_try_flush_free_folios() to migrc_flush_free_folios(), > > d. migrc_stop to migrc_pause. > > (feedbacked by Nadav) > > 6. Use ->lru list_head instead of introducing a new llist_head. > > (feedbacked by Nadav) > > 7. Use non-atomic operations of page-flag when it's safe. > > (feedbacked by Nadav) > > 8. Use stack instead of keeping a pointer of 'struct migrc_req' > > in struct task, which is for manipulating it locally. > > (feedbacked by Nadav) > > 9. Replace a lot of simple functions to inline functions placed > > in a header, mm/internal.h. (feedbacked by Nadav) > > 10. Add additional sufficient comments. (feedbacked by Nadav) > > 11. Remove a lot of wrapper functions. (feedbacked by Nadav) > > > > Changes from RFC v2: > > > > 1. Remove additional occupation in struct page. To do that, > > unioned with lru field for migrc's list and added a page > > flag. I know page flag is a thing that we don't like to add > > but no choice because migrc should distinguish folios under > > migrc's control from others. Instead, I force migrc to be > > used only on 64 bit system to mitigate you guys from getting > > angry. > > 2. Remove meaningless internal object allocator that I > > introduced to minimize impact onto the system. However, a ton > > of tests showed there was no difference. > > 3. Stop migrc from working when the system is in high memory > > pressure like about to perform direct reclaim. At the > > condition where the swap mechanism is heavily used, I found > > the system suffered from regression without this control. > > 4. Exclude folios that pte_dirty() == true from migrc's interest > > so that migrc can work simpler. > > 5. Combine several patches that work tightly coupled to one. > > 6. Add sufficient comments for better review. > > 7. Manage migrc's request in per-node manner (from globally). > > 8. Add TLB miss improvement in commit message. > > 9. Test with more CPUs(4 -> 16) to see bigger improvement. > > > > Changes from RFC: > > > > 1. Fix a bug triggered when a destination folio at the previous > > migration becomes a source folio at the next migration, > > before the folio gets handled properly so that the folio can > > play with another migration. There was inconsistency in the > > folio's state. Fixed it. > > 2. Split the patch set into more pieces so that the folks can > > review better. (Feedbacked by Nadav Amit) > > 3. Fix a wrong usage of barrier e.g. smp_mb__after_atomic(). > > (Feedbacked by Nadav Amit) > > 4. Tried to add sufficient comments to explain the patch set > > better. (Feedbacked by Nadav Amit) > > > > Byungchul Park (3): > > mm/rmap: Recognize read-only TLB entries during batched TLB flush > > mm: Defer TLB flush by keeping both src and dst folios at migration > > mm: Pause migrc mechanism at high memory pressure > > > > arch/x86/include/asm/tlbflush.h | 3 + > > arch/x86/mm/tlb.c | 11 ++ > > include/linux/mm_types.h | 21 +++ > > include/linux/mmzone.h | 9 ++ > > include/linux/page-flags.h | 4 + > > include/linux/sched.h | 7 + > > include/trace/events/mmflags.h | 3 +- > > mm/internal.h | 78 ++++++++++ > > mm/memory.c | 11 ++ > > mm/migrate.c | 266 ++++++++++++++++++++++++++++++++ > > mm/page_alloc.c | 30 +++- > > mm/rmap.c | 35 ++++- > > 12 files changed, 475 insertions(+), 3 deletions(-)