[ Editing down to just the bare-bones problem cases ] On Sun, Nov 6, 2022 at 1:06 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > anon_vma (bad) > -------------- > > See folio_lock_anon_vma_read(): folio_mapped() plays a key role in > establishing the continued validity of an anon_vma. See comments > above folio_get_anon_vma(), some by me but most by PeterZ IIRC. > > I believe what has happened is that your patchset has, very intentionally, > kept the page as "folio_mapped" until after free_pgtables() does its > unlink_anon_vmas(); but that is telling folio_lock_anon_vma_read() > that the anon_vma is safe to use when actually it has been freed. > (It looked like a page table when I peeped at it.) > > I'm not certain, but I think that you made page_zap_pte_rmap() handle > anon as well as file, just for the righteous additional simplification; > but I'm afraid that (without opening a huge anon_vma refcounting can of > worms) that unification has to be reverted, and anon left to go the > same old way it did before. Indeed. I made them separate initially, just because the only case that mattered for the dirty bit was the file-mapped case. But then the two functions ended up being basically the identical function, so I unified them again. But the anonvma lifetime issue looks very real, and so doing the "delay rmap only for file mappings" seems sane. In fact, I wonder if we should delay it only for *dirty* file mappings, since it doesn't matter for the clean case. Hmm. I already threw away my branch (since Andrew picked the patches up), so a question for Andrew: do you want me to re-do the branch entirely, or do you want me to just send you an incremental patch? To make for minimal changes, I'd drop the 're-unification' patch, and then small updates to the zap_pte_range() code to keep the anon (and possibly non-dirty) case synchronous. And btw, this one is interesting: for anonymous (and non-dirty file-mapped) patches, we actually can end up delaying the final page free (and the rmap zapping) all the way to "tlb_finish_mmu()". Normally we still have the vma's all available, but yes, free_pgtables() can and does happen before the final TLB flush. The file-mapped dirty case doesn't have that issue - not just because it doesn't have an anonvma at all, but because it also does that "force_flush" thing that just measn that the page freeign never gets delayed that far in the first place. > mm-unstable (bad) > ----------------- > Aside from that PageAnon issue, mm-unstable is in an understandably bad > state because you could not have foreseen my subpages_mapcount addition > to page_remove_rmap(). page_zap_pte_rmap() now needs to handle the > PageCompound (but not the "compound") case too. I rushed you and akpm > an emergency patch for that on Friday night, but you, let's say, had > reservations about it. So I haven't posted it, and while the PageAnon > issue remains, I think your patchset has to be removed from mm-unstable > and linux-next anyway. So I think I'm fine with your patch, I just want to move the memcg accounting to outside of it. I can re-do my series on top of mm-unstable, I guess. That's probably the easiest way to handle this all. Andrew - can you remove those patches again, and I'll create a new series for you? Linus