On 25/03/15 08:19 PM, David Rientjes wrote: > On Wed, 25 Mar 2015, Daniel Micay wrote: > >>> I'm not sure I get your description right. The problem I know about is >>> where "purging" means madvise(MADV_DONTNEED) and khugepaged later >>> collapses a new hugepage that will repopulate the purged parts, >>> increasing the memory usage. One can limit this via >>> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none . That >>> setting doesn't affect the page fault THP allocations, which however >>> happen only in newly accessed hugepage-sized areas and not partially >>> purged ones, though. >> >> Since jemalloc doesn't unmap memory but instead does recycling itself in >> userspace, it ends up with large spans of free virtual memory and gets >> *lots* of huge pages from the page fault heuristic. It keeps track of >> active vs. dirty (not purged) vs. clean (purged / untouched) ranges >> everywhere, and will purge dirty ranges as they build up. >> >> The THP allocation on page faults mean it ends up with memory that's >> supposed to be clean but is really not. >> >> A worst case example with the (up until recently) default chunk size of >> 4M is allocating a bunch of 2.1M allocations. Chunks are naturally >> aligned, so each one can be represented as 2 huge pages. It increases >> memory usage by nearly *50%*. The allocator thinks the tail is clean >> memory, but it's not. When the allocations are freed, it will purge the >> 2.1M at the head (once enough dirty memory builds up) but all of the >> tail memory will be leaked until something else is allocated there and >> then freed. >> > > With tcmalloc, it's simple to always expand the heap by mmaping 2MB ranges > for size classes <= 2MB, allocate its own metadata from an arena that is > also expanded in 2MB range, and always do madvise(MADV_DONTNEED) for the > longest span on the freelist when it does periodic memory freeing back to > the kernel, and even better if the freed memory splits at most one > hugepage. When memory is pulled from the freelist of memory that has > already been returned to the kernel, you can return a span that will make > it eligible to be collapsed into a hugepage based on your setting of > max_ptes_none, trying to consolidate the memory as much as possible. If > your malloc is implemented in a way to understand the benefit of > hugepages, and how much memory you're willing to sacrifice (max_ptes_none) > for it, then you should _never_ be increasing memory usage by 50%. If khugepaged was the only source of huge pages, sure. The primary source of huge pages is the heuristic handing out an entire 2M page on the first page fault in a 2M range.
Attachment:
signature.asc
Description: OpenPGP digital signature