Yu Zhao <yuzhao@xxxxxxxxxx> writes: [snip] > +/* Main function used by foreground, background and user-triggered aging. */ > +static bool walk_mm_list(struct lruvec *lruvec, unsigned long next_seq, > + struct scan_control *sc, int swappiness) > +{ > + bool last; > + struct mm_struct *mm = NULL; > + int nid = lruvec_pgdat(lruvec)->node_id; > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > + struct lru_gen_mm_list *mm_list = get_mm_list(memcg); > + > + VM_BUG_ON(next_seq > READ_ONCE(lruvec->evictable.max_seq)); > + > + /* > + * For each walk of the mm list of a memcg, we decrement the priority > + * of its lruvec. For each walk of memcgs in kswapd, we increment the > + * priorities of all lruvecs. > + * > + * So if this lruvec has a higher priority (smaller value), it means > + * other concurrent reclaimers (global or memcg reclaim) have walked > + * its mm list. Skip it for this priority to balance the pressure on > + * all memcgs. > + */ > +#ifdef CONFIG_MEMCG > + if (!mem_cgroup_disabled() && !cgroup_reclaim(sc) && > + sc->priority > atomic_read(&lruvec->evictable.priority)) > + return false; > +#endif > + > + do { > + last = get_next_mm(lruvec, next_seq, swappiness, &mm); > + if (mm) > + walk_mm(lruvec, mm, swappiness); > + > + cond_resched(); > + } while (mm); It appears that we need to scan the whole address space of multiple processes in this loop? If so, I have some concerns about the duration of the function. Do you have some number of the distribution of the duration of the function? And may be the number of mm_struct and the number of pages scanned. In comparison, in the traditional LRU algorithm, for each round, only a small subset of the whole physical memory is scanned. Best Regards, Huang, Ying > + > + if (!last) { > + /* foreground aging prefers not to wait unless "necessary" */ > + if (!current_is_kswapd() && sc->priority < DEF_PRIORITY - 2) > + wait_event_killable(mm_list->nodes[nid].wait, > + next_seq < READ_ONCE(lruvec->evictable.max_seq)); > + > + return next_seq < READ_ONCE(lruvec->evictable.max_seq); > + } > + > + VM_BUG_ON(next_seq != READ_ONCE(lruvec->evictable.max_seq)); > + > + inc_max_seq(lruvec); > + > +#ifdef CONFIG_MEMCG > + if (!mem_cgroup_disabled()) > + atomic_add_unless(&lruvec->evictable.priority, -1, 0); > +#endif > + > + /* order against inc_max_seq() */ > + smp_mb(); > + /* either we see any waiters or they will see updated max_seq */ > + if (waitqueue_active(&mm_list->nodes[nid].wait)) > + wake_up_all(&mm_list->nodes[nid].wait); > + > + wakeup_flusher_threads(WB_REASON_VMSCAN); > + > + return true; > +} > + [snip] Best Regards, Huang, Ying