Re: [v4 0/3] Reduce TLB flushes under some specific conditions

Byungchul Park <byungchul@xxxxxx> · Fri, 10 Nov 2023 10:32:24 +0900

On Thu, Nov 09, 2023 at 01:20:29PM +0800, Huang, Ying wrote:
> Byungchul Park <byungchul@xxxxxx> writes:
> 
> > Hi everyone,
> >
> > While I'm working with CXL memory, I have been facing migration overhead
> > esp. TLB shootdown on promotion or demotion between different tiers.
> > Yeah.. most TLB shootdowns on migration through hinting fault can be
> > avoided thanks to Huang Ying's work, commit 4d4b6d66db ("mm,unmap: avoid
> > flushing TLB in batch if PTE is inaccessible").
> >
> > However, it's only for ones using hinting fault. I thought it'd be much
> > better if we have a general mechanism to reduce # of TLB flushes and
> > TLB misses, that we can apply to any type of migration. I tried it only
> > for tiering migration for now tho.
> >
> > I'm suggesting a mechanism to reduce TLB flushes by keeping source and
> > destination of folios participated in the migrations until all TLB
> > flushes required are done, only if those folios are not mapped with
> > write permission PTE entries at all. I worked Based on v6.6-rc5.
> >
> > Can you believe it? I saw the number of TLB full flush reduced about
> > 80% and iTLB miss reduced about 50%, and the time wise performance
> > always shows at least 1% stable improvement with the workload I tested
> > with, XSBench. However, I believe that it would help more with other
> > ones or any real ones. It'd be appreciated to let me know if I'm missing
> > something.
> 
> Can you help to test the effect of commit 7e12beb8ca2a ("migrate_pages:
> batch flushing TLB") for your test case?  To test it, you can revert it
> and compare the performance before and after the reverting.

I will.

> And, how do you trigger migration when testing XSBench?  Use a tiered
> memory system, and migrate pages between DRAM and CXL memory back and
> forth?  If so, how many pages will you migrate for each migration?

Honestly I've been focusing on the migration # and TLB #. I will get
back to you.

	Byungchul

> --
> Best Regards,
> Huang, Ying
> 
> >
> > 	Byungchul
> >
> > ---
> >
> > Changes from v3:
> >
> > 	1. Don't use the kconfig, CONFIG_MIGRC, and remove sysctl knob,
> > 	   migrc_enable. (feedbacked by Nadav)
> > 	2. Remove the optimization skipping CPUs that have already
> > 	   performed TLB flushes needed by any reason when performing
> > 	   TLB flushes by migrc because I can't tell the performance
> > 	   difference between w/ the optimization and w/o that.
> > 	   (feedbacked by Nadav)
> > 	3. Minimize arch-specific code. While at it, move all the migrc
> >            declarations and inline functions from include/linux/mm.h to
> >            mm/internal.h (feedbacked by Dave Hansen, Nadav)
> > 	4. Separate a part making migrc paused when the system is in
> > 	   high memory pressure to another patch. (feedbacked by Nadav)
> > 	5. Rename:
> > 	      a. arch_tlbbatch_clean() to arch_tlbbatch_clear(),
> > 	      b. tlb_ubc_nowr to tlb_ubc_ro,
> > 	      c. migrc_try_flush_free_folios() to migrc_flush_free_folios(),
> > 	      d. migrc_stop to migrc_pause.
> > 	   (feedbacked by Nadav)
> > 	6. Use ->lru list_head instead of introducing a new llist_head.
> > 	   (feedbacked by Nadav)
> > 	7. Use non-atomic operations of page-flag when it's safe.
> > 	   (feedbacked by Nadav)
> > 	8. Use stack instead of keeping a pointer of 'struct migrc_req'
> > 	   in struct task, which is for manipulating it locally.
> > 	   (feedbacked by Nadav)
> > 	9. Replace a lot of simple functions to inline functions placed
> > 	   in a header, mm/internal.h. (feedbacked by Nadav)
> > 	10. Add additional sufficient comments. (feedbacked by Nadav)
> > 	11. Remove a lot of wrapper functions. (feedbacked by Nadav)
> >
> > Changes from RFC v2:
> >
> > 	1. Remove additional occupation in struct page. To do that,
> > 	   unioned with lru field for migrc's list and added a page
> > 	   flag. I know page flag is a thing that we don't like to add
> > 	   but no choice because migrc should distinguish folios under
> > 	   migrc's control from others. Instead, I force migrc to be
> > 	   used only on 64 bit system to mitigate you guys from getting
> > 	   angry.
> > 	2. Remove meaningless internal object allocator that I
> > 	   introduced to minimize impact onto the system. However, a ton
> > 	   of tests showed there was no difference.
> > 	3. Stop migrc from working when the system is in high memory
> > 	   pressure like about to perform direct reclaim. At the
> > 	   condition where the swap mechanism is heavily used, I found
> > 	   the system suffered from regression without this control.
> > 	4. Exclude folios that pte_dirty() == true from migrc's interest
> > 	   so that migrc can work simpler.
> > 	5. Combine several patches that work tightly coupled to one.
> > 	6. Add sufficient comments for better review.
> > 	7. Manage migrc's request in per-node manner (from globally).
> > 	8. Add TLB miss improvement in commit message.
> > 	9. Test with more CPUs(4 -> 16) to see bigger improvement.
> >
> > Changes from RFC:
> >
> > 	1. Fix a bug triggered when a destination folio at the previous
> > 	   migration becomes a source folio at the next migration,
> > 	   before the folio gets handled properly so that the folio can
> > 	   play with another migration. There was inconsistency in the
> > 	   folio's state. Fixed it.
> > 	2. Split the patch set into more pieces so that the folks can
> > 	   review better. (Feedbacked by Nadav Amit)
> > 	3. Fix a wrong usage of barrier e.g. smp_mb__after_atomic().
> > 	   (Feedbacked by Nadav Amit)
> > 	4. Tried to add sufficient comments to explain the patch set
> > 	   better. (Feedbacked by Nadav Amit)
> >
> > Byungchul Park (3):
> >   mm/rmap: Recognize read-only TLB entries during batched TLB flush
> >   mm: Defer TLB flush by keeping both src and dst folios at migration
> >   mm: Pause migrc mechanism at high memory pressure
> >
> >  arch/x86/include/asm/tlbflush.h |   3 +
> >  arch/x86/mm/tlb.c               |  11 ++
> >  include/linux/mm_types.h        |  21 +++
> >  include/linux/mmzone.h          |   9 ++
> >  include/linux/page-flags.h      |   4 +
> >  include/linux/sched.h           |   7 +
> >  include/trace/events/mmflags.h  |   3 +-
> >  mm/internal.h                   |  78 ++++++++++
> >  mm/memory.c                     |  11 ++
> >  mm/migrate.c                    | 266 ++++++++++++++++++++++++++++++++
> >  mm/page_alloc.c                 |  30 +++-
> >  mm/rmap.c                       |  35 ++++-
> >  12 files changed, 475 insertions(+), 3 deletions(-)