Hi, Sorry for the self-followup but looking further into this, there are more subtle troubles with this patch that I think are worth discussing (not just the PageHuge check to skip the __page_cache_release). > > - struct page *page_head = compound_trans_head(page); > > + struct page *page_head; > > + > > + /* > > + * If this is a hugetlbfs page it cannot be split under us. Simply > > + * increment refcount for the head page. > > + */ > > + if (PageHuge(page)) { > > + page_head = compound_head(page); > > + atomic_inc(&page_head->_count); > > + got = true; > > + goto out; > > + } PageHuge: int PageHuge(struct page *page) /* page for this example is a THP PageTail */ { compound_page_dtor *dtor; if (!PageCompound(page)) return 0; page = compound_head(page); dtor = get_compound_page_dtor(page); return dtor == free_huge_page; } compound_head looks like this: static inline struct page *compound_head(struct page *page) { if (unlikely(PageTail(page))) ========================== split_huge_page return page->first_page; return page; } This patch adds comments: /* * hugetlbfs pages cannot be split from under us. If this is a * hugetlbfs page, check refcount on head page and release the page if * the refcount becomes zero. */ While this comment is correct for hugetlbfs, the harder problem is when this is a THP tail page that can be splitted... and it looks like that wasn't taken into consideration. It looks stable on x86 but: 1) this is just a micro-optmization and as such it should be microoptimzied, and it isn't. It's repeating compound_head twice for hugetlbfs and it's running compound_head+compound_trans_head for THP when a single one is needed in both cases. 2) it doesn't remove compound_trans_head from the rest of the kernel. With this patch you just made compound_trans_head meaningless. And you added an implicit dependency on split_huge_page that cannot ever touch page->first_page, see the "========= split_huge_page" above. 3) if you go ahead with 2), we'll never be able to propagate page->private from head to tail in split_huge_page, a thing potentially required by tmpfs (Kirill?), or the kernel would crash in put_page with this patch applied (dereferencing a dangling pointer because tail_page->first_page is overwritten by page_head->private). 4) it doesn't verify that the THP page size is never bigger than the smallest hugetlbfs page size. Such a condition would lead to silent memory corruption because it assumes it deals with a valid hugetlbfs head without pinning it and without rechecking pagetail, before taking the hugetlbfs-only path. See for example PageSlab check, it does exactly this. PageSlab min order is 1, so smaller than the THP page order. None of the above seems to be fatal in practice on x86, but I doubt it is worth it for -stable, and for upstream now you need to decide if we drop all compound_trans_head. If we don't drop compound_trans_head to allow propagating page->private from head to tail in the future, it should be possible to fix this patch to use a single compound_trans_head, so at the same time we optimize away the ugly inefficiency I mentioned in 1). Then if we want to remove the flakey assumption in 4) we should move this together with the PageSlab check. And to move it along with PageSlab I think you only need to implement a PageHeadHuge check (that is required for 1) too). If you insist to skip get_page_unless_zero (not just the compound_lock) then I'd recommend at least to add a boot time check so that if HPAGE_PMD_SHIFT > HPAGE_SHIFT THP disengage itself at boot and cannot be ever enabled. So if some arch has multiple huge page sizes and decides to make the THP page size not the smallest one available in hardware, at least it will notice it's not a supported config in a graceful way. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html