Re: [PATCH] thp: tail page refcounting fix #2

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Fri, 26 Aug 2011 18:10:48 +0200

On Thu, Aug 25, 2011 at 11:24:36PM -0700, Michel Lespinasse wrote:
> I had never heard before of locked instructions being necessary when a
> straight assignment would do what we want, but after reading the erratas
> you listed, I'm not so sure anymore. Given that, I think the version with
> just one single atomic add is good enough.

spin_unlock sometime is adding the lock prefix too for that reason. So
I feel safer that way.

> (there are also 511 consecutive atomic_sub calls on the head page _count,
> which could just as well be coalesced into a signle one at the end of the
> tail page loop).

That should be safe. It's not like I'm a mood to microoptimize
__split_huge_page_refcount after you found I forgot the
get_page_unless_zero needed to keep the page->flags stable (they're
overwritten by the time the head page is freed, that is why we need it).

> I think your current __get_page_tail() is unsafe when it takes the
> compound lock on the head page, because there is no refcount held on it.
> If the THP page gets broken up before we get the compound lock, the head
> page could get freed. But it looks like you could fix that by doing
> get_page_unless_zero on the head, and you should end up with something
> very much like the put_page() function, which I find incredibly tricky
> but seems to be safe.

Correct, it's enough and we need it for the same reason it is in
put_page. Nothing new or no new fundamental problem with this
approach, just an implementation mistake. At least it could introduced
no regression compared to the previous code.

> I would suggest moving get_page_foll() and __get_page_tail_foll() to
> mm/internal.h so that people writing code outside of mm/ don't get confused
> about which get_page() version they must call.

Good idea. That is for MM internal usage only, only follow_page is
allowed to call it.

> In __get_page_tail(), you could add a VM_BUG_ON(page_mapcount(page) <= 0)
> to reflect the fact that get_page() callers are expected to have already
> gotten a reference on the page through a gup call.

So I could put it just before calling __get_page_tail_foll().

I don't see a way anybody could call get_page on a tail page without
having called gup on it first. So I think it's correct. Any
pfn-scanning code like your working set estimation code has to use
get_page_unless_zero and that will never succeed anymore for tail
pages.

> (not your fault, you just moved that code) The comment above
> reset_page_mapcount() and page_mapcount() mentions that _count starts from -1.
> This does not seem to be accurate anymore - as you see page_count() just
> returns the _count value without adding 1. I guess you could just remove
> ', like _count,' from the comment and that'd make it accurate :)

The comment talks about _mapcount not _count. page_mapcount still adds
1 to _mapcount and _mapcount really still starts from -1.

> The use of _mapcount to store tail page counts should probably be
> documented somewhere - probably in mm_types.h where _mapcount is
> defined, and/or before the page_mapcount accessor function. Or, there
> could be a tail_page_count() accessor function for that so that it's
> evident in all call sites that we're accessing a refcount and not a mapcount:
> 
> static inline int tail_page_count(struct page *page)
> {
> 	VM_BUG_ON(!PageTail(page));
> 	return page_mapcount(page);
> }
> 
> 
> (probably for another commit) I'm not too comfortable with having several
> arch-specific fast gup functions knowning details about how page counts
> are implemented. Linus's tree also adds such support in sparc arch
> (and it doesn't even seem to be correct as it increments the head count
> but not the tail count). This should probably be cleaned up sometime by
> moving such details into generic inline helper functions.
> 
> 
> Besides these comments, overall I like the change a lot & I'm especially
> happy to see get_page() work in all cases again :)

Glad to hear :).

Thanks a lot for pointing out the missing get_page_unless_zero(). I'll
post a #3 version soon with that bit fixed.

I'm undecided of tail_page_count is needed. The only benefit would be
to be able to grep for tail_page_count and see the few call sites, maybe
that makes it worth it. The VM_BUG_ON I doubt is necessary there
considering it's easy to review the callsites and they're so few. It'd
also need to go into internal.h I guess.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>