On Sun, 2010-01-31 at 21:27 +0100, Andrea Arcangeli wrote: > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > Futex code is smarter than most other gup_fast O_DIRECT code and knows about > the compound internals. However now doing a put_page(head_page) will not > release the pin on the tail page taken by gup-fast, leading to all sort of > refcounting bugchecks. Getting a stable head_page is a little tricky. > > page_head = page is there because if this is not a tail page it's also the > page_head. Only in case this is a tail page, compound_head is called, otherwise > it's guaranteed unnecessary. And if it's a tail page compound_head has to run > atomically inside irq disabled section __get_user_pages_fast before returning. > Otherwise ->first_page won't be a stable pointer. > > Disableing irq before __get_user_page_fast and releasing irq after running > compound_head is needed because if __get_user_page_fast returns == 1, it means > the huge pmd is established and cannot go away from under us. > pmdp_splitting_flush_notify in __split_huge_page_splitting will have to wait > for local_irq_enable before the IPI delivery can return. This means > __split_huge_page_refcount can't be running from under us, and in turn when we > run compound_head(page) we're not reading a dangling pointer from > tailpage->first_page. Then after we get to stable head page, we are always safe > to call compound_lock and after taking the compound lock on head page we can > finally re-check if the page returned by gup-fast is still a tail page. in > which case we're set and we didn't need to split the hugepage in order to take > a futex on it. > > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Acked-by: Mel Gorman <mel@xxxxxxxxx> > Acked-by: Rik van Riel <riel@xxxxxxxxxx> > --- > > diff --git a/kernel/futex.c b/kernel/futex.c > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -218,7 +218,7 @@ get_futex_key(u32 __user *uaddr, int fsh > { > unsigned long address = (unsigned long)uaddr; > struct mm_struct *mm = current->mm; > - struct page *page; > + struct page *page, *page_head; > int err; > > /* > @@ -250,10 +250,36 @@ again: > if (err < 0) > return err; > > - page = compound_head(page); > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + page_head = page; > + if (unlikely(PageTail(page))) { > + put_page(page); > + /* serialize against __split_huge_page_splitting() */ > + local_irq_disable(); > + if (likely(__get_user_pages_fast(address, 1, 1, &page) == 1)) { > + page_head = compound_head(page); > + local_irq_enable(); > + } else { > + local_irq_enable(); > + goto again; > + } > + } > +#else > + page_head = compound_head(page); > +#endif > + > + lock_page(page_head); > + if (unlikely(page_head != page)) { > + compound_lock(page_head); > + if (unlikely(!PageTail(page))) { > + compound_unlock(page_head); > + unlock_page(page_head); > + put_page(page); > + goto again; > + } > + } > + if (!page_head->mapping) { > + unlock_page(page_head); > put_page(page); > goto again; > } OK, so I really don't like this, the futex code is pain enough without having all this open-coded gunk in. Is there really no sensible vm-helper you can use here? Also, that whole local_irq_disable(); __gup_fast(); dance is terribly x86 specific, and this is generic core kernel code. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>