Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 26 Mar 2015 18:25:32 +0100

On 03/26/2015 04:24 AM, Daniel Micay wrote:
It's all well and good to say that you shouldn't do that, but it's the
basis of the design in jemalloc and other zone-based arena allocators.

There's a chosen chunk size and chunks are naturally aligned. An
allocation is either a span of chunks (chunk-aligned) or has metadata
stored in the chunk header. This also means chunks can be assigned to
arenas for a high level of concurrency. Thread caching is then only
necessary for batching operations to amortize the cost of locking rather
than to reduce contention. Per-CPU arenas can be implemented quite well
by using sched_getcpu() to move threads around whenever it detects that
another thread allocated from the arena.

With >= 2M chunks, madvise purging works very well at the chunk level
but there's also fine-grained purging within chunks and it completely
breaks down from THP page faults.

Are you sure it's due to page faults and not khugepaged + high value 
(such as the default 511) of max_ptes_none? As reported here?

https://bugzilla.kernel.org/show_bug.cgi?id=93111

Once you have faulted in a THP, and then purged part of it and split it, 
I don't think page faults in the purged part can lead to a new THP 
collapse, only khugepaged can do that AFAIK.
And if you mmap smaller than 2M areas (i.e. your 256K chunks), that 
should prevent THP page faults on the first fault within the chunk as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html