On Mon, Nov 28, 2022 at 08:56:54PM +0100, Jann Horn wrote: > On Mon, Nov 28, 2022 at 8:54 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > On Mon, Nov 28, 2022 at 10:03 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > > > Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP > > > collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to > > > ensure that the page table was not removed by khugepaged in between. > > > > > > However, lockless_pages_from_mm() still requires that the page table is not > > > concurrently freed or reused to store non-PTE data. Otherwise, problems > > > can occur because: > > > > > > - deposited page tables can be freed when a THP page somewhere in the > > > mm is removed > > > - some architectures store non-PTE information inside deposited page > > > tables (see radix__pgtable_trans_huge_deposit()) > > > > > > Additionally, lockless_pages_from_mm() is also somewhat brittle with > > > regards to page tables being repeatedly moved back and forth, but > > > that shouldn't be an issue in practice. > > > > > > Fix it by sending IPIs (if the architecture uses > > > semi-RCU-style page table freeing) before freeing/reusing page tables. > > > > > > As noted in mm/gup.c, on configs that define CONFIG_HAVE_FAST_GUP, > > > there are two possible cases: > > > > > > 1. CONFIG_MMU_GATHER_RCU_TABLE_FREE is set, causing > > > tlb_remove_table_sync_one() to send an IPI to synchronize with > > > lockless_pages_from_mm(). > > > 2. CONFIG_MMU_GATHER_RCU_TABLE_FREE is unset, indicating that all > > > TLB flushes are already guaranteed to send IPIs. > > > tlb_remove_table_sync_one() will do nothing, but we've already > > > run pmdp_collapse_flush(), which did a TLB flush, which must have > > > involved IPIs. > > > > I'm trying to catch up with the discussion after the holiday break. I > > understand you switched from always allocating a new page table page > > (we decided before) to sending IPIs to serialize against fast-GUP, > > this is fine to me. > > > > So the code now looks like: > > pmdp_collapse_flush() > > sending IPI > > > > But the missing part is how we reached "TLB flushes are already > > guaranteed to send IPIs" when CONFIG_MMU_GATHER_RCU_TABLE_FREE is > > unset? ARM64 doesn't do it IIRC. Or did I miss something? > > From arch/arm64/Kconfig: > > select MMU_GATHER_RCU_TABLE_FREE > > CONFIG_MMU_GATHER_RCU_TABLE_FREE is not a config option that the user > can freely toggle; it is an option selected by the architecture. True. I think I understand what Yang is confused about and I had the same question (asked in the old threads but didn't yet got a confirmation), since I think arm64 didn't use IPI for tlb is also true (according to the arm64 version of __flush_tlb_range), so PPC doesn't seem to be the only one. I mentioned PPC only because I saw the comment in mmu_gather.c: * Architectures that do not have this (PPC) need to delay the freeing by some * other means, this is that means. So I think it's obsolete. In short, IIUC there's just an implicit dependency that any !MMU_GATHER_RCU_TABLE_FREE arch must require IPI for tlb flush (not vice versa, hence arm64 can have RCU_TABLE_FREE), or something could be broken. -- Peter Xu