On 9/30/19 2:20 AM, Jan Kara wrote:
On Fri 27-09-19 12:31:41, John Hubbard wrote:
On 9/27/19 5:33 AM, Michal Hocko wrote:
On Thu 26-09-19 20:26:46, John Hubbard wrote:
On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
...
CPU 0 CPU 1
-------- ---------------
get_user_pages_fast()
do_unmap()
unmap_region()
free_pgtables()
/*
* speculative reads, re-ordered
* by the CPU at run time, not
* compile time. The function calls
* are not really in this order, but
* the corresponding reads could be.
*/
gup_pgd_range()
gup_p4d_range()
gup_pud_range()
gup_pmd_range()
pmd = READ_ONCE(*pmdp)
/* This is stale */
tlb_finish_mmu()
tlb_flush_mmu()
tlb_flush_mmu_tlbonly()
tlb_flush()
flush_tlb_mm
flush_tlb_mm_range
flush_tlb_others
native_flush_tlb_others
smp_call_function_many: IPIs
...blocks until CPU1 reenables
interrupts
local_irq_disable()
...continue the page walk based
on stale/freed data...
Yes, but then we have:
head = try_get_compound_head(page, 1);
which has an atomic operation providing barrier in it and then we have:
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
put_page(head);
goto pte_unmap;
}
So we reload PTE again and check the value didn't change. Which should
prevent the race you explain above. Or do I miss anything?
Well, a couple of questions:
1. Is there *really* a memory barrier in try_get_compound_head()? Because
I only see a READ_ONCE compile barrier, which would mean that run time
reordering is still possible.
2. Your code above is after the "pmd = READ_ONCE(*pmdp)" line, so by then,
it's already found the pte based on reading a stale pmd. So checking the
pte seems like it's checking the wrong thing--it's too late, for this case,
right?
thanks,
--
John Hubbard
NVIDIA