On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> wrote: > > This patchset avoids building changes added by commit bd74fdaea146 ("mm: > multi-gen LRU: support page table walks") on platforms that don't support > hardware atomic updates of access bits. > > Aneesh Kumar K.V (5): > mm/mglru: Create a new helper iterate_mm_list_walk > mm/mglru: Move Bloom filter code around > mm/mglru: Move code around to make future patch easy > mm/mglru: move iterate_mm_list_walk Helper > mm/mglru: Don't build multi-gen LRU page table walk code on > architecture not supported > > arch/Kconfig | 3 + > arch/arm64/Kconfig | 1 + > arch/x86/Kconfig | 1 + > include/linux/memcontrol.h | 2 +- > include/linux/mm_types.h | 10 +- > include/linux/mmzone.h | 12 +- > kernel/fork.c | 2 +- > mm/memcontrol.c | 2 +- > mm/vmscan.c | 955 +++++++++++++++++++------------------ > 9 files changed, 528 insertions(+), 460 deletions(-) 1. There is no need for a new Kconfig -- the condition is simply defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young) 2. The best practice to disable static functions is not by macros but: static int const_cond(void) { return 1; } int main(void) { int a = const_cond(); if (a) return 0; /* the compiler doesn't generate code for static funcs below */ static_func_1(); ... static_func_N(); LTO also optimizes external functions. But not everyone uses it. So we still need macros for them, and of course data structures. 3. In 4/5, you have: @@ -461,6 +461,7 @@ enum { struct lru_gen_mm_state { /* set to max_seq after each iteration */ unsigned long seq; +#ifdef CONFIG_LRU_TASK_PAGE_AGING /* where the current iteration continues after */ struct list_head *head; /* where the last iteration ended before */ @@ -469,6 +470,11 @@ struct lru_gen_mm_state { unsigned long *filters[NR_BLOOM_FILTERS]; /* the mm stats for debugging */ unsigned long stats[NR_HIST_GENS][NR_MM_STATS]; +#else + /* protect the seq update above */ + /* May be we can use lruvec->lock? */ + spinlock_t lock; +#endif }; The answer is yes, and not only that, we don't need lru_gen_mm_state at all. I'm attaching a patch that fixes all above. If you want to post it, please feel free -- fully test it please, since I didn't. Otherwise I can ask TJ to help make this work for you. $ git diff --stat include/linux/memcontrol.h | 2 +- include/linux/mm_types.h | 12 +- include/linux/mmzone.h | 2 + kernel/bounds.c | 6 +- kernel/fork.c | 2 +- mm/vmscan.c | 169 +++++++++++++++++++-------- 6 files changed, 137 insertions(+), 56 deletions(-) On x86: $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750) Function old new delta ... should_skip_vma 206 - -206 get_pte_pfn 261 - -261 lru_gen_add_mm 323 - -323 lru_gen_seq_show 1710 1370 -340 lru_gen_del_mm 432 - -432 reset_batch_size 572 - -572 try_to_inc_max_seq 2947 1635 -1312 walk_pmd_range_locked 1508 - -1508 walk_pud_range 3238 - -3238 Total: Before=99449, After=91699, chg -7.79% $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:" 000000000000a350 <try_to_inc_max_seq>: { a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5> a355: 55 push %rbp a356: 48 89 e5 mov %rsp,%rbp a359: 41 57 push %r15 a35b: 41 56 push %r14 a35d: 41 55 push %r13 a35f: 41 54 push %r12 a361: 53 push %rbx a362: 48 83 ec 70 sub $0x70,%rsp a366: 41 89 d4 mov %edx,%r12d a369: 49 89 f6 mov %rsi,%r14 a36c: 49 89 ff mov %rdi,%r15 spin_lock_irq(&lruvec->lru_lock); a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx a373: 48 89 df mov %rbx,%rdi a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b> success = max_seq == lrugen->max_seq; a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax a382: 4c 39 f0 cmp %r14,%rax
Attachment:
no_mm_walk.patch
Description: Binary data