Re: page-flags behavior on compound pages: a worry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:
> 
> All this situation is ugly. I'm thinking on more general solution for
> PageTail() vs. ->first_page race.
> 
> We would be able to avoid the race in first place if we encode PageTail()
> and position of head page within the same word in struct page. This way we
> update both thing in one shot without possibility of race.
> 
> Details get tricky.
> 
> I'm going to try tomorrow something like this: encode the position of head
> as offset from the tail page and store it as negative number in the union
> with ->mapping and ->s_mem. PageTail() can be implemented as check value
> of the field to be in range -1..-MAX_ORDER_NR_PAGES. 
> 
> I'm not sure at all if it's going to work, especially looking on
> ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.
> 
> We could also try to encode page order instead (again as negative number)
> and calculate head page position based on alignment...
> 
> Any other ideas are welcome.

Good luck, I've not given it any thought, but hope it works out:
my reasoning was the same when I put the PageAnon bit into
page->mapping instead of page->flags.

Something to beware of though: although exceedingly unlikely to be a
problem, page->mapping always contained a pointer to or into a relevant
structure, or else something that could not possibly be a kernel pointer,
when I was working on KSM swapping: see comment above get_ksm_page() in
mm/ksm.c.  It is best to keep page->mapping for pointers if possible
(and probably avoid having the PageAnon bit set unless really Anon).

I've only just read your mail, and I'm too slow a thinker to have
worked through your isolate_migratepages_block() race yet.  But, given
the timing, cannot resist sending you a code fragment I wrote earlier
today for our v3.11-based kernel: which still has compound_trans_order(),
which we had been using in a similar racy physical scan.

I'm not for a moment suggesting that this fragment is relevant to your
race; but it is something amusing to consider when you're thinking of
such races.  Credit to Greg Thelen for thinking of the prep_compound_page()
end of it, when I'd been focussed on the __split_huge_page_refcount() end.

	/*
	 * It is not safe to use compound_lock (inside compound_trans_order)
	 * until we have a reference on the page (okay, done above) and have
	 * then seen PageLRU on it (just below): because mm/huge_memory.c uses
	 * the non-atomic __SetPageUptodate on a freshly allocated THPage in
	 * several places, believing it to be invisible to the outside world,
	 * but liable to race and leave PG_compound_lock set when cleared here.
	 */
	nr_pages = 1;
	if (PageHead(page)) {
		/*
		 * smp_rmb() against the smp_wmb() in the first iteration of
		 * prep_compound_page(), so that the PageTail test ensures
		 * that compound_order(page) is now correctly readable.
		 */
		smp_rmb();
		if (PageTail(page + 1)) {
			nr_pages = 1 << compound_order(page);
			/*
			 * Then smp_rmb() against smp_wmb() in last iteration of
			 * __split_huge_page_refcount(), to ensure that has not
			 * yet written something else into page[1].lru.prev.
			 */
			smp_rmb();
			if (!PageTail(page + 1))
				nr_pages = 1;
		}
	}

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]