Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 30-09-19 10:57:08, John Hubbard wrote:
> On 9/30/19 2:20 AM, Jan Kara wrote:
> > On Fri 27-09-19 12:31:41, John Hubbard wrote:
> > > On 9/27/19 5:33 AM, Michal Hocko wrote:
> > > > On Thu 26-09-19 20:26:46, John Hubbard wrote:
> > > > > On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
> ...
> > > CPU 0                              CPU 1
> > > --------                           ---------------
> > >                                     get_user_pages_fast()
> > > 
> > > do_unmap()
> > >   unmap_region()
> > >    free_pgtables()
> > >                                     /*
> > >                                      * speculative reads, re-ordered
> > >                                      * by the CPU at run time, not
> > >                                      * compile time. The function calls
> > >                                      * are not really in this order, but
> > >                                      * the corresponding reads could be.
> > >                                      */
> > >                                     gup_pgd_range()
> > >                                      gup_p4d_range()
> > >                                       gup_pud_range()
> > >                                        gup_pmd_range()
> > >                                         pmd = READ_ONCE(*pmdp)
> > >                                          /* This is stale */
> > > 
> > >    tlb_finish_mmu()
> > >      tlb_flush_mmu()
> > >       tlb_flush_mmu_tlbonly()
> > >         tlb_flush()
> > >          flush_tlb_mm
> > >           flush_tlb_mm_range
> > >            flush_tlb_others
> > >             native_flush_tlb_others
> > >              smp_call_function_many: IPIs
> > >               ...blocks until CPU1 reenables
> > >                  interrupts
> > > 
> > >                                          local_irq_disable()
> > >                                           ...continue the page walk based
> > >                                              on stale/freed data...
> > 
> > Yes, but then we have:
> > 
> >                  head = try_get_compound_head(page, 1);
> > 
> > which has an atomic operation providing barrier in it and then we have:
> > 
> >                  if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> >                          put_page(head);
> >                          goto pte_unmap;
> >                  }
> > 
> > So we reload PTE again and check the value didn't change. Which should
> > prevent the race you explain above. Or do I miss anything?
> 
> 
> Well, a couple of questions:
> 
> 1. Is there *really* a memory barrier in try_get_compound_head()? Because
> I only see a READ_ONCE compile barrier, which would mean that run time
> reordering is still possible.

try_get_compound_head() has page_cache_add_speculative() which is
atomic_add_unless() which is guaranteed to provide ordering.

> 2. Your code above is after the "pmd = READ_ONCE(*pmdp)" line, so by then,
> it's already found the pte based on reading a stale pmd. So checking the
> pte seems like it's checking the wrong thing--it's too late, for this case,
> right?

Well, if PMD is getting freed, all PTEs in it should be cleared by that
time, shouldn't they? So although we read from stale PMD, either we already
see cleared PTE or the check pte_val(pte) != pte_val(*ptep) will fail and
so we never actually succeed in getting stale PTE entry (at least unless
the page table page that used to be PMD can get freed and reused
- which is not the case in the example you've shown above).

So I still don't see a problem. That being said I don't feel being expert
in this particular area. I just always thought GUP prevents races like this
by the scheme I describe so I'd like to see what I'm missing :).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux