Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.01.21 04:31, Linus Torvalds wrote:
> On Tue, Jan 12, 2021 at 6:16 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>
>> The thing about the speculative page cache references is that they can
>> temporarily bump a refcount on a page which _used_ to be in the page
>> cache and has now been reallocated as some other kind of page.
> 
> Oh, and thinking about this made me think we might actually have a
> serious bug here, and it has nothing what-so-ever to do with COW, GUP,
> or even the page count itself.
> 
> It's unlikely enough that I think it's mostly theoretical, but tell me
> I'm wrong.
> 
> PLEASE tell me I'm wrong:
> 
> CPU1 does page_cache_get_speculative under RCU lock
> 
> CPU2 frees and re-uses the page
> 
>     CPU1                CPU2
>     ----                ----
> 
>     page = xas_load(&xas);
>     if (!page_cache_get_speculative(page))
>             goto repeat;
>     .. succeeds ..
> 
>                         remove page from XA
>                         release page
>                         reuse for something else
> 
>     .. and then re-check ..
>     if (unlikely(page != xas_reload(&xas))) {
>             put_page(page);
>             goto repeat;
>     }
> 
> ok, the above all looks fine. We got the speculative ref, but then we
> noticed that its' not valid any more, so we put it again. All good,
> right?
> 
> Wrong.
> 
> What if that "reuse for something else" was actually really quick, and
> both allocated and released it?
> 
> That still sounds good, right? Yes, now the "put_page()" will be the
> one that _actually_ releases the page, but we're still fine, right?
> 
> Very very wrong.
> 
> The "reuse for something else" on CPU2 might have gotten not an
> order-0 page, but a *high-order* page. So it allocated (and then
> immediately free'd) maybe an order-2 allocation with _four_ pages, and
> the re-use happened when we had coalesced the buddy pages.
> 
> But when we release the page on CPU1, we will release just _one_ page,
> and the other three pages will be lost forever.
> 
> IOW, we restored the page count perfectly fine, but we screwed up the
> page sizes and buddy information.
> 
> Ok, so the above is so unlikely from a timing standpoint that I don't
> think it ever happens, but I don't see why it couldn't happen in
> theory.
> 
> Please somebody tell me I'm missing some clever thing we do to make
> sure this can actually not happen..

Wasn't that tackled by latest (not merged AFAIKs) __free_pages() changes?

I'm only able to come up with the doc update, not with the oroginal
fix/change

https://lkml.kernel.org/r/20201027025523.3235-1-willy@xxxxxxxxxxxxx

-- 
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux