On Mon, Oct 21, 2013 at 11:01:07AM -0600, Khalid Aziz wrote: > Hi Andrea, > > Sure, this can happen but wouldn't this case be caught by the last line > in PageHuge() - "return dtor == free_huge_page;". This conditional > should return false for THP PageTail, right? The problem is that get_compound_page_dtor(page) runs on a "page" pointer that could be a dangling link if we ever update the tail_page->private field during split_huge_page. get_compound_page_dtor has to dereference "page" somehow to get to the "dtor" value. > I see. PageHuge() changes the pointer it is passed to point to the head > page. Calling compound_head on the head page is superfluous even though > it doesn't do any harm and I can easily remove the call to > compound_head() in the patch, but was it really intentional in > PageHuge() to change the incoming pointer? That is a significant side > effect of a routine that on the surface looks like a simple check that > would return true or false. PageHuge in current implementation can be passed a tail or head hugetlbfs page and it looks all right for that kind of hugetlbfs usage. It wasn't supposed to be called by the VM core that has to deal with all kind of compound pages (not just hugetlbfs). PageHuge cannot be passed a THP page unless we decide to depend on first_page not to ever change across a split_huge_page. But if we decide to depend on that (like current upstream code after the hugetlbfs directio optimization was applied), then we should also convert all compound_trans_head to compound_head or we leave dirt into the source. Plus that would optimize away all other smp_rmb too. In older kernels page->first_page was in the same union with page->mapping and IMHO it looked too flakey to depend on alignments or variable ordering of fields that tends to change all the time within that very union. Now page->mapping is out of that union, so it looks much safer safer to drop compound_trans_head. Now that page->mapping is out of that union, the only concerns that remains if that if we convert all compound_trans_head to compound_head, then we'd never be allowed to ever enforce the invariant that page->private is zero whenever PG_private is clear and the page is not the tail of a compound page (all locations except split_huge_page are enforcing that, but anon pages cannot care less about page->private being zero and page->private is clobbered and then initialized to zero during allocation by the buddy allocator when the page is reallocated so it's fine in practice, but maybe somebody could already consider it a bug even that we leave dirt in page->private after split_huge_page). So as THP expands into tmpfs and pagecache, it may be more relevant to be consistent and be able to update page->private of the tail pages during split_huge_page. Either to clear or maybe to propagate it from head to tail pages. It seems clearing page->private whenever PG_private is not set, is purely a debug tweak and so we're effectively wasting CPU not just in the smp_rmb of compound_trans_head, but also in the set_page_private(page, 0) of all over the place whenever a PG_private is cleared and starting from prep_new_page. There are two ways to go for the long term: 1) drop all set_page_private everywhere and convert all compound_trans_head to compound_head (and cross fingers that we'd never need to propagate the head page->private to the tail pages during split_huge_page and hope some random update to struct page doesn't introduce silent corruption hardly reproducible) 2) keep using compound_trans_head and add set_page_private(tail_page, 0) to split_huge_page to enforce the invariant that all regular anon pages have page->private zero (debug tweak already enforced everywhere else) I'm fine either ways, I'm just saying current source is not ok as compound_trans_head isn't making sense anymore. > Calling PageTail(page) after having called PageHuge(page) is pointless > since "page" was already changed to point to the head page. So is the > error in PageHuge(page) changing the pointer, or is the error in usage > of PageHuge()? PageHuge is fine as is I think (I supposed it wants to succeed on tail pages too). In the short term I would just run compound_trans_head once like previous code did. Then introduce a PageHeadHuge that skips the compound_head and PageCompound check to optimize your current patch. Then later make another patch that converts all compound_trans_head to compound_head, or that adds set_page_private(tail_page, 0) to split_huge_page to enforce the PG_private vs page->private debug tweak. > > > > 2) it doesn't remove compound_trans_head from the rest of the kernel. > > > > With this patch you just made compound_trans_head meaningless. And > > you added an implicit dependency on split_huge_page that cannot > > ever touch page->first_page, see the "========= split_huge_page" above. > > I think PageHuge() already takes this into account by checking the dtor > pointer. So I think this is not a problem. I do no intend to make > compound_trans_head meaningless but I see the problem with calling it > after having called PageHuge(), as I pointed out in my comments above. I see you didn't intend to alter the THP path, it was just a side effect. But even if we convert all compound_trans_head to compound_head we're still left with a superflous compound_head in all cases (hugetlbfs/slab/THP etc..) in those paths, so we can surely optimize this further (considering this is a microoptimization it's worth optimizing it fully). > I/O performance regression on -stable makes this problem worth looking > at in my opinion. Performance regression is very significant based upon > tests from database folks. There is definitely room for improvement in > the patch and your experience with this code is immensely helpful in > doing that. compound_trans_head shows up as one of the big hitters in > perf top when running database performance tests. My only intention in > using PageHuge() was to determine definitively if the page I am working > with is hugetlbfs page or not which PageHuge() seems to do except I > missed that it has a significant side effect. This can be addressed by > implementing PageHeadHuge() which is almost identical to PageHuge() > without the side effect, as you suggested. yes. > It is reasonable to add get_page_unless_zero() to hugetlbfs only path. We should use the benchmark to find if it's a problem to run get_page_unless_zero there (by plugging the newly introduced PageHeadHuge together with the PageSlab check). And then we should also verify if the compound_trans_head if measurable by implementing it identical to compound_head, so we know if it's worth it or not to keep it as now it's mostly needed to cope with page->private. Thanks! Andrea -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html