On 18/03/15 06:31 PM, Andrew Morton wrote: > On Tue, 17 Mar 2015 14:09:39 -0700 Shaohua Li <shli@xxxxxx> wrote: > >> There was a similar patch posted before, but it doesn't get merged. I'd like >> to try again if there are more discussions. >> http://marc.info/?l=linux-mm&m=141230769431688&w=2 >> >> mremap can be used to accelerate realloc. The problem is mremap will >> punch a hole in original VMA, which makes specific memory allocator >> unable to utilize it. Jemalloc is an example. It manages memory in 4M >> chunks. mremap a range of the chunk will punch a hole, which other >> mmap() syscall can fill into. The 4M chunk is then fragmented, jemalloc >> can't handle it. > > Daniel's changelog had additional details regarding the userspace > allocators' behaviour. It would be best to incorporate that into your > changelog. > > Daniel also had microbenchmark testing results for glibc and jemalloc. > Can you please do this? > > I'm not seeing any testing results for tcmalloc and I'm not seeing > confirmation that this patch will be useful for tcmalloc. Has anyone > tried it, or sought input from tcmalloc developers? TCMalloc and jemalloc are currently equally slow in this benchmark, as neither makes use of mremap. They're ~2-3x slower than glibc. I CC'ed the currently most active TCMalloc developer so they can give input into whether this patch would let them use it. #include <string.h> #include <stdlib.h> int main(void) { void *ptr = NULL; size_t old_size = 0; for (size_t size = 4 * 1024 * 1024; size < 1024 * 1024 * 1024; size *= 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr, 0xff, size - old_size); old_size = size; } free(ptr); } If an outer loop is wrapped around this, jemalloc's master branch will at least be able to do in-place resizing for everything after the 1st run, but that's much rarer in the real world where there are many users of the allocator. The lack of mremap still ends up hurting a lot. FWIW, jemalloc is now the default allocator on Android so there are an increasing number of Linux machines unable to leverage mremap. It could be worked around by attempting to use an mmap hint to get the memory back, but that can fail as it's a race with the other threads and that leads increases fragmentation over the long term. It's especially problematic if a large range of virtual memory is reserved and divided up between per-CPU arenas for concurrency, but only garbage collectors tend to do stuff like this at the moment. This can still be dealt with by checking internal uses of mmap and returning any memory from the reserved range to the right place, but it shouldn't have to be that ugly.
Attachment:
signature.asc
Description: OpenPGP digital signature