On Tue, Jun 15, 2021 at 02:09:38PM +0200, Jann Horn wrote: > On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > On 6/14/21 6:20 PM, Jann Horn wrote: > > > try_grab_compound_head() is used to grab a reference to a page from > > > get_user_pages_fast(), which is only protected against concurrent > > > freeing of page tables (via local_irq_save()), but not against > > > concurrent TLB flushes, freeing of data pages, or splitting of compound > > > pages. > [...] > > Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx> > > Thanks! > > [...] > > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) > > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > > > return NULL; > > > if (unlikely(!page_cache_add_speculative(head, refs))) > > > return NULL; > > > + > > > + /* > > > + * At this point we have a stable reference to the head page; but it > > > + * could be that between the compound_head() lookup and the refcount > > > + * increment, the compound page was split, in which case we'd end up > > > + * holding a reference on a page that has nothing to do with the page > > > + * we were given anymore. > > > + * So now that the head page is stable, recheck that the pages still > > > + * belong together. > > > + */ > > > + if (unlikely(compound_head(page) != head)) { > > > > I was just wondering about what all could happen here. Such as: page gets split, > > reallocated into a different-sized compound page, one that still has page pointing > > to head. I think that's OK, because we don't look at or change other huge page > > fields. > > > > But I thought I'd mention the idea in case anyone else has any clever ideas about > > how this simple check might be insufficient here. It seems fine to me, but I > > routinely lack enough imagination about concurrent operations. :) > > Hmmm... I think the scariest aspect here is probably the interaction > with concurrent allocation of a compound page on architectures with > store-store reordering (like ARM). *If* the page allocator handled > compound pages with lockless, non-atomic percpu freelists, I think it > might be possible that the zeroing of tail_page->compound_head in > put_page() could be reordered after the page has been freed, > reallocated and set to refcount 1 again? Oh wow, yes, this all looks sketchy! Doing a RCU access to page->head is a really challenging thing :\ On the simplified store side: page->head = my_compound *ptep = page There must be some kind of release barrier between those two operations or this is all broken.. That definately deserves a comment. Ideally we'd use smp_store_release to install the *pte :\ Assuming we cover the release barrier, I would think the algorithm should be broadly: struct page *target_page = READ_ONCE(pte) struct page *target_folio = READ_ONCE(target_page->head) page_cache_add_speculative(target_folio, refs) if (target_folio != READ_ONCE(target_page->head) || target_page != READ_ONCE(pte)) goto abort Which is what this patch does but I would like to see the READ_ONCE's. And there possibly should be two try_grab_compound_head()'s since we don't need this overhead on the fully locked path, especially the double atomic on page_ref_add() > I think the lockless page cache code also has to deal with somewhat > similar ordering concerns when it uses page_cache_get_speculative(), > e.g. in mapping_get_entry() - first it looks up a page pointer with > xas_load(), and any access to the page later on would be a _dependent > load_, but if the page then gets freed, reallocated, and inserted into > the page cache again before the refcount increment and the re-check > using xas_reload(), then there would be no data dependency from > xas_reload() to the following use of the page... xas_store() should have the smp_store_release() inside it at least.. Even so it doesn't seem to do page->head, so this is not quite the same thing Jason