On Tue, 2010-03-30 at 00:17 +0200, Andrea Arcangeli wrote: > On Mon, Mar 29, 2010 at 08:49:44PM +0200, Peter Zijlstra wrote: > > On Mon, 2010-03-29 at 20:37 +0200, Andrea Arcangeli wrote: > > > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > > > > > PG_buddy can be converted to page->_count == -1. So the PG_compound_lock can be > > > added to page->flags without overflowing (because of the section bits > > > increasing) with CONFIG_X86_PAE=y. > > > > This seems to break the assumption that all free pages have a zero page > > count relied upon by things like page_cache_get_speculative(). > > > > What if a page-cache pages gets freed and used as a head in the buddy > > list while a concurrent lockless page-cache lookup tries to get a page > > ref? > > I forgot about get_page_unless_zero, still the concept remains the > same, we've just to move from _count to _mapcount or some other field > in the page that we know will never to be some fixed value. Mapcount > is the next candidate as it uses atomic ops and it starts from -1 but > it should only be available on already allocated pages and to be > guaranteed -1 when inside the buddy, so we can set mapcount -2 to > signal the page is in the buddy. Or something like that, to me > mapcount looks ideal but it's likely doubt in other means. The basic > idea is that PG_buddy is a waste of ram Don't forget that include/linux/memory_hotplug.h uses mapcount a bit for marking bootmem. So, just for clarity, we'd probably want to use -5 or something. /* * Types for free bootmem. * The normal smallest mapcount is -1. Here is smaller value than it. */ #define SECTION_INFO (-1 - 1) #define MIX_SECTION_INFO (-1 - 2) #define NODE_INFO (-1 - 3) Looks like SLUB also uses _mapcount for some fun purposes: struct page { unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ atomic_t _count; /* Usage count, see below. */ union { atomic_t _mapcount; /* Count of ptes mapped in mms, * to show when page is mapped * & limit reverse map searches. */ struct { /* SLUB */ u16 inuse; u16 objects; }; }; I guess those don't *really* become a problem in practice until we get a really large page size that can hold >=64k objects. But, at that point, we're overflowing the types anyway (or really close to it). -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>