Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely

John Hubbard <jhubbard@xxxxxxxxxx> · Tue, 1 Oct 2019 11:43:30 -0700

On 10/1/19 12:10 AM, Jan Kara wrote:
> On Mon 30-09-19 10:57:08, John Hubbard wrote:
>> On 9/30/19 2:20 AM, Jan Kara wrote:
>>> On Fri 27-09-19 12:31:41, John Hubbard wrote:
>>>> On 9/27/19 5:33 AM, Michal Hocko wrote:
>>>>> On Thu 26-09-19 20:26:46, John Hubbard wrote:
>>>>>> On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
>> ...
>> 2. Your code above is after the "pmd = READ_ONCE(*pmdp)" line, so by then,
>> it's already found the pte based on reading a stale pmd. So checking the
>> pte seems like it's checking the wrong thing--it's too late, for this case,
>> right?
> 
> Well, if PMD is getting freed, all PTEs in it should be cleared by that
> time, shouldn't they? So although we read from stale PMD, either we already
> see cleared PTE or the check pte_val(pte) != pte_val(*ptep) will fail and
> so we never actually succeed in getting stale PTE entry (at least unless
> the page table page that used to be PMD can get freed and reused
> - which is not the case in the example you've shown above).
>

Right, that's not what the example shows, but there is nothing here to prevent
the page table pages from being freed and re-used.

 
> So I still don't see a problem. That being said I don't feel being expert
> in this particular area. I just always thought GUP prevents races like this
> by the scheme I describe so I'd like to see what I'm missing :).
> 

I'm very much still "in training" here, so I hope I'm not wasting everyone's 
time. But I feel confident in stating at least this much:

There are two distinct lockless synchronization mechanisms here, each protecting
against a different issue, and it's important not to conflate them and think that
one protects against the other. I still see a hole in (2) below. The mechanisms 
are:

1) Protection against a page (not the page table itself) getting freed
while get_user_pages*() is trying to take a reference to it. This is avoided
by the try-get and the re-checking of pte values that you mention above.
It's an elegant little thing, too. :)

2) Protection against page tables getting freed while a get_user_pages_fast()
call is in progress. This relies on disabling interrupts in gup_fast(), while
firing interrupts in the freeing path (via tlb flushing, IPIs). And on memory
barriers (doh--missing!) to avoid leaking memory references outside of the
irq disabling. 

This one has a problem, because Documentation/memory-barriers.txt points out:

INTERRUPT DISABLING FUNCTIONS
-----------------------------

Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
barriers are required in such a situation, they must be provided from some
other means.


...and so I'm suggesting that we need something approximately like this:

diff --git a/mm/gup.c b/mm/gup.c
index 23a9f9c9d377..1678d50a2d8b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2415,7 +2415,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
        if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
            gup_fast_permitted(start, end)) {
                local_irq_disable();
+               smp_mb();
                gup_pgd_range(addr, end, gup_flags, pages, &nr);
+               smp_mb();
                local_irq_enable();
                ret = nr;


thanks,
-- 
John Hubbard
NVIDIA