On Fri 27-09-19 12:31:41, John Hubbard wrote: > On 9/27/19 5:33 AM, Michal Hocko wrote: > > On Thu 26-09-19 20:26:46, John Hubbard wrote: > > > On 9/26/19 3:20 AM, Kirill A. Shutemov wrote: > > > > BTW, have you looked at other levels of page table hierarchy. Do we have > > > > the same issue for PMD/PUD/... pages? > > > > > > > > > > Along the lines of "what other memory barriers might be missing for > > > get_user_pages_fast(), I'm also concerned that the synchronization between > > > get_user_pages_fast() and freeing the page tables might be technically broken, > > > due to missing memory barriers on the get_user_pages_fast() side. Details: > > > > > > gup_fast() disables interrupts, but I think it also needs some sort of > > > memory barrier(s), in order to prevent reads of the page table (gup_pgd_range, > > > etc) from speculatively happening before the interrupts are disabled. > > > > Could you be more specific about the race scenario please? I thought > > that the unmap path will be serialized by the pte lock. > > > > I don't see a pte lock anywhere here. > > This case is really pretty far out there, but without run-time memory barriers > I don't think we can completely rule it out: > > CPU 0 CPU 1 > -------- --------------- > get_user_pages_fast() > > do_unmap() > unmap_region() > free_pgtables() > /* > * speculative reads, re-ordered > * by the CPU at run time, not > * compile time. The function calls > * are not really in this order, but > * the corresponding reads could be. > */ > gup_pgd_range() > gup_p4d_range() > gup_pud_range() > gup_pmd_range() > pmd = READ_ONCE(*pmdp) > /* This is stale */ > > tlb_finish_mmu() > tlb_flush_mmu() > tlb_flush_mmu_tlbonly() > tlb_flush() > flush_tlb_mm > flush_tlb_mm_range > flush_tlb_others > native_flush_tlb_others > smp_call_function_many: IPIs > ...blocks until CPU1 reenables > interrupts > > local_irq_disable() > ...continue the page walk based > on stale/freed data... Yes, but then we have: head = try_get_compound_head(page, 1); which has an atomic operation providing barrier in it and then we have: if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_page(head); goto pte_unmap; } So we reload PTE again and check the value didn't change. Which should prevent the race you explain above. Or do I miss anything? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR