It's all well and good to say that you shouldn't do that, but it's the basis of the design in jemalloc and other zone-based arena allocators. There's a chosen chunk size and chunks are naturally aligned. An allocation is either a span of chunks (chunk-aligned) or has metadata stored in the chunk header. This also means chunks can be assigned to arenas for a high level of concurrency. Thread caching is then only necessary for batching operations to amortize the cost of locking rather than to reduce contention. Per-CPU arenas can be implemented quite well by using sched_getcpu() to move threads around whenever it detects that another thread allocated from the arena. With >= 2M chunks, madvise purging works very well at the chunk level but there's also fine-grained purging within chunks and it completely breaks down from THP page faults. The allocator packs memory towards low addresses (address-ordered best-fit and first-fit can both be done in O(log n) time) so swings in memory usage will tend to clear large spans of memory which will then fault in huge pages no matter how it was mapped. Once MADV_FREE can be used rather than MADV_DONTNEED, this would only happen after memory pressure... but that's not very comforting. I don't find it acceptable that programs can have huge (up to ~30% in real programs) amounts of memory leaked over time due to THP page faults. This is a very real problem impacting projects like Redis, MariaDB and Firefox because they all use jemalloc. https://shk.io/2015/03/22/transparent-huge-pages/ https://www.percona.com/blog/2014/07/23/why-tokudb-hates-transparent-hugepages/ http://dev.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb https://bugzilla.mozilla.org/show_bug.cgi?id=770612 Bionic (Android's libc) switched over to jemalloc too. The only reason you don't hear about this with glibc is because it doesn't have aggressive, fine-grained purging and a low fragmentation design in the first place.
Attachment:
signature.asc
Description: OpenPGP digital signature