On Thu, Sep 1, 2022 at 11:39 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 02.09.22 01:50, Yang Shi wrote: > > On Thu, Sep 1, 2022 at 4:26 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > >> > >> Hi, Yang, > >> > >> On Thu, Sep 01, 2022 at 03:27:07PM -0700, Yang Shi wrote: > >>> Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: > >>> introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer > >>> sufficient to handle concurrent GUP-fast in all cases, it only handles > >>> traditional IPI-based GUP-fast correctly. > >> > >> If TLB flush (or, IPI broadcasts) used to work to protect against gup-fast, > >> I'm kind of confused why it's not sufficient even if with RCU gup? Isn't > >> that'll keep working as long as interrupt disabled (which current fast-gup > >> will still do)? > > > > Actually the wording was copied from David's commit log for his > > PageAnonExclusive fix. My understanding is the IPI broadcast still > > works, but it may not be supported by all architectures and not > > preferred anymore. So we should avoid depending on IPI broadcast IIUC. > > Right. Not all architectures perform an IPI broadcast on TLB flush. > > IPI broadcasts will continue working until we use RCU instead of > disabling local interrupts in GUP-fast. > > > >>> CPU A CPU B > >>> THP collapse fast GUP > >>> gup_pmd_range() <-- see valid pmd > >>> gup_pte_range() <-- work on pte > >>> pmdp_collapse_flush() <-- clear pmd and flush > >>> __collapse_huge_page_isolate() > >>> check page pinned <-- before GUP bump refcount > >>> pin the page > >>> check PTE <-- no change > >>> __collapse_huge_page_copy() > >>> copy data to huge page > >>> ptep_clear() > >>> install huge pmd for the huge page > >>> return the stale page > >>> discard the stale page > >>> > >>> The race could be fixed by checking whether PMD is changed or not after > >>> taking the page pin in fast GUP, just like what it does for PTE. If the > >>> PMD is changed it means there may be parallel THP collapse, so GUP > >>> should back off. > >> > >> Could the race also be fixed by impl pmdp_collapse_flush() correctly for > >> the archs that are missing? Do you know which arch(s) is broken with it? > > > > Yes, and this was suggested by me in the first place, but per the > > suggestion from John and David, this is not the preferred way. I think > > it is because: > > > > Firstly, using IPI to serialize against fast GUP is not recommended > > anymore since fast GUP does check PTE then back off so we should avoid > > it. > > Secondly, if checking PMD then backing off could solve the problem, > > why do we still need broadcast IPI? It doesn't sound performant. > > I'd say, using an IPI is the old-styled way of doing things. Sure, using > an IPI broadcast will work (and IMHO it's a lot easier to > not-get-wrong). But it somewhat contradicts to the new way of doing things. > > >> > >> It's just not clear to me whether this patch is an optimization or a fix, > >> if it's a fix whether the IPI broadcast in ppc pmdp_collapse_flush() would > >> still be needed. > > > > It is a fix and the fix will make IPI broadcast not useful anymore. > > I'd wonder how "easy" adding the IPI broadcast would be -- IOW, if the > IPI fix has a real advantage. Not sure either, but I guess calling a dummy function via IPI broadcast should just work. Powepc does so. > > > -- > Thanks, > > David / dhildenb >