Hi Greg,
I've been looking at the tcmalloc issues, but did seem to affect osd's, and I do notice it in heavy read workloads (even after the patch and increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This is affecting the mon process though.
looking at perf top I'm getting most of the CPU usage in mutex lock/unlock
5.02% libpthread-2.19.so [.] pthread_mutex_unlock
3.82% libsoftokn3.so [.] 0x000000000001e7cb
3.46% libpthread-2.19.so [.] pthread_mutex_lock
I could try to use jemalloc, are you aware of any built binaries? Can I mix a cluster with different malloc binaries?
On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> MALLOC: + 27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped)On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito <periquito@xxxxxxxxx> wrote:
> The ceph-mon is already taking a lot of memory, and I ran a heap stats
> ------------------------------------------------
> MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application
> MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap freelist
> MALLOC: + 16598552 ( 15.8 MiB) Bytes in central cache freelist
> MALLOC: + 14693536 ( 14.0 MiB) Bytes in transfer cache freelist
> MALLOC: + 17441592 ( 16.6 MiB) Bytes in thread cache freelists
> MALLOC: + 116387992 ( 111.0 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 27794649240 (26507.0 MiB) Actual memory used (physical + swap)
> MALLOC: + 26116096 ( 24.9 MiB) Bytes released to OS (aka unmapped)
> MALLOC: ------------
> MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used
> MALLOC:
> MALLOC: 5683 Spans in use
> MALLOC: 21 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
>
> after that I ran the heap release and it went back to normal.
> ------------------------------------------------
> MALLOC: 22919616 ( 21.9 MiB) Bytes in use by application
> MALLOC: + 4792320 ( 4.6 MiB) Bytes in page heap freelist
> MALLOC: + 18743448 ( 17.9 MiB) Bytes in central cache freelist
> MALLOC: + 20645776 ( 19.7 MiB) Bytes in transfer cache freelist
> MALLOC: + 18456088 ( 17.6 MiB) Bytes in thread cache freelists
> MALLOC: + 116387992 ( 111.0 MiB) Bytes in malloc metadata
> MALLOC: ------------
> MALLOC: = 201945240 ( 192.6 MiB) Actual memory used (physical + swap)
> MALLOC: ------------
> MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used
> MALLOC:
> MALLOC: 5639 Spans in use
> MALLOC: 29 Thread heaps in use
> MALLOC: 8192 Tcmalloc page size
> ------------------------------------------------
>
> So it just seems the monitor is not returning unused memory into the OS or
> reusing already allocated memory it deems as free...
Yep. This is a bug (best we can tell) in some versions of tcmalloc
combined with certain distribution stacks, although I don't think
we've seen it reported on Trusty (nor on a tcmalloc distribution that
new) before. Alternatively some folks are seeing tcmalloc use up lots
of CPU in other scenarios involving memory return and it may manifest
like this, but I'm not sure. You could look through the mailing list
for information on it.
-Greg
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com