Generic RCU fast GUP rely on page_cache_get_speculative() to obtain pin on pte-mapped page. As pointed by Aneesh during review of my compound pages refcounting rework, page_cache_get_speculative() would fail on pte-mapped tail page, since tail pages always have page->_count == 0. That means we would never be able to successfully obtain pin on pte-mapped tail page via generic RCU fast GUP. But the problem is not exclusive to my patchset. In current kernel some drivers (sound, for instance) already map compound pages with PTEs. Let's teach page_cache_get_speculative() about tail. We can acquire pin by speculatively taking pin on head page and recheck that compound page didn't disappear under us. Retry if it did. We don't care about THP tail page refcounting -- THP *tail* pages shouldn't be found where page_cache_get_speculative() is used -- pagecache radix tree or page tables. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> Cc: Steve Capper <steve.capper@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> --- include/linux/pagemap.h | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 7c3790764795..573a2510da36 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -142,8 +142,10 @@ void release_pages(struct page **pages, int nr, bool cold); */ static inline int page_cache_get_speculative(struct page *page) { + struct page *head_page; VM_BUG_ON(in_interrupt()); - +retry: + head_page = compound_head_fast(page); #ifdef CONFIG_TINY_RCU # ifdef CONFIG_PREEMPT_COUNT VM_BUG_ON(!in_atomic()); @@ -157,11 +159,11 @@ static inline int page_cache_get_speculative(struct page *page) * disabling preempt, and hence no need for the "speculative get" that * SMP requires. */ - VM_BUG_ON_PAGE(page_count(page) == 0, page); - atomic_inc(&page->_count); + VM_BUG_ON_PAGE(page_count(head_page) == 0, head_page); + atomic_inc(&head_page->_count); #else - if (unlikely(!get_page_unless_zero(page))) { + if (unlikely(!get_page_unless_zero(head_page))) { /* * Either the page has been freed, or will be freed. * In either case, retry here and the caller should @@ -170,7 +172,26 @@ static inline int page_cache_get_speculative(struct page *page) return 0; } #endif - VM_BUG_ON_PAGE(PageTail(page), page); + /* compound_head_fast() seen PageTail(page) == true */ + if (unlikely(head_page != page)) { + /* + * compound_head_fast() could fetch dangling page->first_page + * pointer to an old compound page, so recheck that it's still + * a tail page before returning. + */ + smp_mb__after_atomic(); + if (unlikely(!PageTail(page))) { + put_page(head_page); + goto retry; + } + /* + * Tail page refcounting is only required for THP pages. + * If page_cache_get_speculative() got called on tail-THP pages + * something went horribly wrong. We don't have THP in pagecache + * and we don't map tail-THP to page tables. + */ + VM_BUG_ON_PAGE(compound_tail_refcounted(head_page), head_page); + } return 1; } -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>