On 6/16/21 1:10 AM, Yang Shi wrote: > On Tue, Jun 15, 2021 at 5:10 AM Jann Horn <jannh@xxxxxxxxxx> wrote: >> >> On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: >> > On 6/14/21 6:20 PM, Jann Horn wrote: >> > > try_grab_compound_head() is used to grab a reference to a page from >> > > get_user_pages_fast(), which is only protected against concurrent >> > > freeing of page tables (via local_irq_save()), but not against >> > > concurrent TLB flushes, freeing of data pages, or splitting of compound >> > > pages. >> [...] >> > Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx> >> >> Thanks! >> >> [...] >> > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) >> > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) >> > > return NULL; >> > > if (unlikely(!page_cache_add_speculative(head, refs))) >> > > return NULL; >> > > + >> > > + /* >> > > + * At this point we have a stable reference to the head page; but it >> > > + * could be that between the compound_head() lookup and the refcount >> > > + * increment, the compound page was split, in which case we'd end up >> > > + * holding a reference on a page that has nothing to do with the page >> > > + * we were given anymore. >> > > + * So now that the head page is stable, recheck that the pages still >> > > + * belong together. >> > > + */ >> > > + if (unlikely(compound_head(page) != head)) { >> > >> > I was just wondering about what all could happen here. Such as: page gets split, >> > reallocated into a different-sized compound page, one that still has page pointing >> > to head. I think that's OK, because we don't look at or change other huge page >> > fields. >> > >> > But I thought I'd mention the idea in case anyone else has any clever ideas about >> > how this simple check might be insufficient here. It seems fine to me, but I >> > routinely lack enough imagination about concurrent operations. :) >> >> Hmmm... I think the scariest aspect here is probably the interaction >> with concurrent allocation of a compound page on architectures with >> store-store reordering (like ARM). *If* the page allocator handled >> compound pages with lockless, non-atomic percpu freelists, I think it >> might be possible that the zeroing of tail_page->compound_head in >> put_page() could be reordered after the page has been freed, >> reallocated and set to refcount 1 again? >> >> That shouldn't be possible at the moment, but it is still a bit scary. > > It might be possible after Mel's "mm/page_alloc: Allow high-order > pages to be stored on the per-cpu lists" patch > (https://patchwork.kernel.org/project/linux-mm/patch/20210611135753.GC30378@xxxxxxxxxxxxxxxxxxx/). Those would be percpu indeed, but not "lockless, non-atomic", no? They are protected by a local_lock. >> >> >> I think the lockless page cache code also has to deal with somewhat >> similar ordering concerns when it uses page_cache_get_speculative(), >> e.g. in mapping_get_entry() - first it looks up a page pointer with >> xas_load(), and any access to the page later on would be a _dependent >> load_, but if the page then gets freed, reallocated, and inserted into >> the page cache again before the refcount increment and the re-check >> using xas_reload(), then there would be no data dependency from >> xas_reload() to the following use of the page... >> >