Hi, While I'm working with CXL, I have been facing migraion overhead esp. TLB shootdown on promotion or demotion between different tiers. Yeah.. most TLB shootdowns on migration through hinting fault can be avoided thanks to Huang Ying's work, commit 4d4b6d66db ("mm,unmap: avoid flushing TLB in batch if PTE is inaccessible"). However, it's only for ones using hinting fault. I thought it'd be much better if we have a general mechanism to reduce # of TLB flushes that we can apply to any type of migration. I tried it only for tiering migration for now tho. I'm suggesting a mechanism to reduce TLB flushes by keeping source and destination of folios participated in the migrations until all TLB flushes required are done, only if those folios are not mapped with write permission PTE entries at all. I saw the number of TLB full flush reduced over 50% and the performance a little bit improved but not that big with the workload I tested with, XSBench. However, I believe that it would help more with other ones or any real ones. It'd be appreciated to tell me if I'm missing something. Byungchul Byungchul Park (2): mm/rmap: Recognize non-writable TLB entries during TLB batch flush mm: Defer TLB flush by keeping both src and dst folios at migration arch/x86/include/asm/tlbflush.h | 9 + arch/x86/mm/tlb.c | 59 +++++++ include/linux/mm.h | 30 ++++ include/linux/mm_types.h | 34 ++++ include/linux/mm_types_task.h | 4 +- include/linux/mmzone.h | 6 + include/linux/sched.h | 5 + init/Kconfig | 12 ++ mm/internal.h | 14 ++ mm/memory.c | 9 +- mm/migrate.c | 287 +++++++++++++++++++++++++++++++- mm/mm_init.c | 1 + mm/page_alloc.c | 16 ++ mm/rmap.c | 121 +++++++++++++- 14 files changed, 595 insertions(+), 12 deletions(-) -- 2.17.1