On Sun, Feb 13, 2022 at 06:04:17PM +0800, Hillf Danton wrote: Hi Hillf, > On Tue, 8 Feb 2022 01:18:55 -0700 Yu Zhao wrote: > > + > > +/****************************************************************************** > > + * the aging > > + ******************************************************************************/ > > + > > +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming) > > +{ > > + unsigned long old_flags, new_flags; > > + int type = folio_is_file_lru(folio); > > + struct lru_gen_struct *lrugen = &lruvec->lrugen; > > + int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); > > + > > + do { > > + new_flags = old_flags = READ_ONCE(folio->flags); > > + VM_BUG_ON_FOLIO(!(new_flags & LRU_GEN_MASK), folio); > > + > > + new_gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > > Is the chance zero for deadloop if new_gen != old_gen? No, because the counter is only cleared during isolation, and here it's protected again isolation (under the LRU lock, which is asserted in the lru_gen_balance_size() -> lru_gen_update_size() path). > > + new_gen = (old_gen + 1) % MAX_NR_GENS; > > + > > + new_flags &= ~LRU_GEN_MASK; > > + new_flags |= (new_gen + 1UL) << LRU_GEN_PGOFF; > > + new_flags &= ~(LRU_REFS_MASK | LRU_REFS_FLAGS); > > + /* for folio_end_writeback() */ > > /* for folio_end_writeback() and sort_folio() */ in terms of > reclaiming? Right. > > + if (reclaiming) > > + new_flags |= BIT(PG_reclaim); > > + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); > > + > > + lru_gen_balance_size(lruvec, folio, old_gen, new_gen); > > + > > + return new_gen; > > +} > > ... > > > +/****************************************************************************** > > + * the eviction > > + ******************************************************************************/ > > + > > +static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx) > > +{ > > Nit, the 80-column-char format is prefered. Will do. > > + bool success; > > + int gen = folio_lru_gen(folio); > > + int type = folio_is_file_lru(folio); > > + int zone = folio_zonenum(folio); > > + int tier = folio_lru_tier(folio); > > + int delta = folio_nr_pages(folio); > > + struct lru_gen_struct *lrugen = &lruvec->lrugen; > > + > > + VM_BUG_ON_FOLIO(gen >= MAX_NR_GENS, folio); > > + > > + if (!folio_evictable(folio)) { > > + success = lru_gen_del_folio(lruvec, folio, true); > > + VM_BUG_ON_FOLIO(!success, folio); > > + folio_set_unevictable(folio); > > + lruvec_add_folio(lruvec, folio); > > + __count_vm_events(UNEVICTABLE_PGCULLED, delta); > > + return true; > > + } > > + > > + if (type && folio_test_anon(folio) && folio_test_dirty(folio)) { > > + success = lru_gen_del_folio(lruvec, folio, true); > > + VM_BUG_ON_FOLIO(!success, folio); > > + folio_set_swapbacked(folio); > > + lruvec_add_folio_tail(lruvec, folio); > > + return true; > > + } > > + > > + if (tier > tier_idx) { > > + int hist = lru_hist_from_seq(lrugen->min_seq[type]); > > + > > + gen = folio_inc_gen(lruvec, folio, false); > > + list_move_tail(&folio->lru, &lrugen->lists[gen][type][zone]); > > + > > + WRITE_ONCE(lrugen->promoted[hist][type][tier - 1], > > + lrugen->promoted[hist][type][tier - 1] + delta); > > + __mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); > > + return true; > > + } > > + > > + if (folio_test_locked(folio) || folio_test_writeback(folio) || > > + (type && folio_test_dirty(folio))) { > > + gen = folio_inc_gen(lruvec, folio, true); > > + list_move(&folio->lru, &lrugen->lists[gen][type][zone]); > > + return true; > > Make the cold dirty page cache younger instead of writeout in the backgroungd > reclaimer context, and the question rising is if laundry is defered until the > flusher threads are waken up in the following patches. This is a good point. In contrast to the active/inactive LRU, MGLRU doesn't write out dirty file pages (kswapd or direct reclaimers) -- this is writeback's job and it should be better at doing this. In fact, commit 21b4ee7029 ("xfs: drop ->writepage completely") has disabled dirty file page writeouts in the reclaim path completely. Reclaim indirectly wakes up writeback after clean file pages drop below a threshold (dirty ratio). However, dirty pages might be under counted on a system that uses a large number of mmapped file pages. MGLRU optimizes this by calling folio_mark_dirty() on pages mapped by dirty PTEs when scanning page tables. (Why not since it's already looking at the accessed bit.) The commit above explained this design choice from the performance aspect. From the implementation aspect, it also creates a boundary between reclaim and writeback. This simplifies things, e.g., the PageWriteback() check in shrink_page_list is no longer relevant for MGLRU, neither is the top half of the PageDirty() check.