On Fri, 30 Aug 2024, Usama Arif wrote: > From: Yu Zhao <yuzhao@xxxxxxxxxx> > > If a tail page has only two references left, one inherited from the > isolation of its head and the other from lru_add_page_tail() which we > are about to drop, it means this tail page was concurrently zapped. > Then we can safely free it and save page reclaim or migration the > trouble of trying it. > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > Tested-by: Shuang Zhai <zhais@xxxxxxxxxx> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx> I'm sorry, but I think this patch (just this 1/6) needs to be dropped: it is only an optimization, and unless a persuasive performance case can be made to extend it, it ought to go (perhaps revisited later). The problem I kept hitting was that all my work, requiring compaction and reclaim, got (killably) stuck in or repeatedly calling reclaim_throttle(): because nr_isolated_anon had grown high - and remained high even when the load had all been killed. Bisection led to the 2/6 (remap to shared zeropage), but I'd say this 1/6 is the one to blame. I was intending to send this patch to "fix" it: --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3295,6 +3295,8 @@ static void __split_huge_page(struct pag folio_clear_active(new_folio); folio_clear_unevictable(new_folio); list_del(&new_folio->lru); + node_stat_sub_folio(folio, NR_ISOLATED_ANON + + folio_is_file_lru(folio)); if (!folio_batch_add(&free_folios, new_folio)) { mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); And that ran nicely, until I terminated the run and did grep nr_isolated /proc/sys/vm/stat_refresh /proc/vmstat at the end: stat_refresh kindly left a pr_warn in dmesg to say nr_isolated_anon -334013737 My patch is not good enough. IIUC, some split_huge_pagers (reclaim?) know how many pages they isolated and decremented the stats by, and increment by that same number at the end; whereas other split_huge_pagers (migration?) decrement one by one as they go through the list afterwards. I've run out of time (I'm about to take a break): I gave up researching who needs what, and was already feeling this optimization does too much second guessing of what's needed (and its array of VM_WARN_ON_ONCE_FOLIOs rather admits to that). And I don't think it's as simple as moving the node_stat_sub_folio() into 2/6 where the zero pte is substituted: that would probably handle the vast majority of cases, but aren't there others which pass the folio_ref_freeze(new_folio, 2) test - the title's zapped tail pages, or racily truncated now that the folio has been unlocked, for example? Hugh