On Sun, Oct 30, 2022 at 11:51 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > We could keep the current placement of the TLB flush, to just before > we drop the page table lock. > > And we could do all the things we do in 'page_remove_rmap()' right now > *except* for the mapcount stuff. > > And only move the mapcount code to the page freeing stage. So I actually have a three-commit series to do the rmap simplification, but let me just post the end result of that series, because the end result is actually smaller than the individual commits (I did it as three incremental commits just to make it more obvious to me how to get to that end result). The three commits end up being mm: introduce simplified versions of 'page_remove_rmap()' mm: inline simpler case of page_remove_file_rmap() mm: re-unify the simplified page_zap_*_rmap() function and the end result of them is this attached patch. I'm *claiming* that the attached patch is semantically identical to what we do before it, just _hugely_ simplified. Basically, that new 'page_zap_pte_rmap()' does the same things that 'page_remove_rmap()' did, except it is limited to only last-level PTE entries (and that munlock_vma_page() has to be called separately). The simplification comes from 'compound' being false, from it always being about small pages, and from the atomic mapcount decrement having been moved outside the memcg lock, since it is independent of it. Anyway, this simplification patch basically means that the *next* step could be to just move that ipage_zap_pte_rmap()' after the TLB flush, and now it's trivial and no longer scary. I did *not* do that yet, because it still needs that "encoded_page[]" array - except now it doesn't encode the 'dirty' bit, now it would encode the 'do a page->_mapcount decrement' bit. I didn't do that part, because needed to do the rc3 release, plus I'd like to have somebody look at this introductory patch first. Linus
include/linux/rmap.h | 1 + mm/memory.c | 3 ++- mm/rmap.c | 24 ++++++++++++++++++++++++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b15..f62af001707c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -196,6 +196,7 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); +void page_zap_pte_rmap(struct page *); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); diff --git a/mm/memory.c b/mm/memory.c index f88c351aecd4..c893f5ffc5a8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1452,8 +1452,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, likely(!(vma->vm_flags & VM_SEQ_READ))) mark_page_accessed(page); } + page_zap_pte_rmap(page); + munlock_vma_page(page, vma, false); rss[mm_counter(page)]--; - page_remove_rmap(page, vma, false); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); if (unlikely(__tlb_remove_page(tlb, page))) { diff --git a/mm/rmap.c b/mm/rmap.c index 2ec925e5fa6a..28b51a31ebb0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1412,6 +1412,30 @@ static void page_remove_anon_compound_rmap(struct page *page) __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); } +/** + * page_zap_pte_rmap - take down a pte mapping from a page + * @page: page to remove mapping from + * + * This is the simplified form of page_remove_rmap(), that only + * deals with last-level pages, so 'compound' is always false, + * and the caller does 'munlock_vma_page(page, vma, compound)' + * separately. + * + * This allows for a much simpler calling convention and code. + * + * The caller holds the pte lock. + */ +void page_zap_pte_rmap(struct page *page) +{ + if (!atomic_add_negative(-1, &page->_mapcount)) + return; + + lock_page_memcg(page); + __dec_lruvec_page_state(page, + PageAnon(page) ? NR_ANON_MAPPED : NR_FILE_MAPPED); + unlock_page_memcg(page); +} + /** * page_remove_rmap - take down pte mapping from a page * @page: page to remove mapping from