Re: Switching from tcmalloc

Jan Schermer <jan@xxxxxxxxxxx> · Wed, 24 Jun 2015 19:54:18 +0200

We did, but I don’t have the numbers. I have lots of graphs, though. We were mainly trying to solve the CPU usage, since our nodes are converged QEMU+CEPH OSDs, so this made a difference. We were also seeing the performance capped on CPUs when deleting snapshots of backfilling, all this should be solved with this.

We graph latency, outstanding operations, you name it - I can share a few graphs with you tomorrow if I get the permission from my boss :-) Makes for a nice comparison with real workload to have one node tcmalloc-free and the others running vanilla ceph-osd.

I guess I can share the final script once that’s finished - right now it uses taskset and then migratepages to the correct NUMA node and is not that nice, the cgroup one will be completely different.

You can try migratepages for yourself to test if it makes a difference - pin an OSD to a specific node (don’t forget to pin all threads) and then run “migratepages $pid old_node new_node”.
You can confirm the memory moving with “numastat -p $pid”. If it doesn’t seem to move then it is probably pagecache allocated on the wrong node, not sure if that can be migrated but you can use /proc/sys/vm/zone_reclaim_mode (=1) which should drop it. I advise setting it to 0 in the end, though as cache is always faster than disks.
YMMV depending on bottlenecks your system has.

Jan

> On 24 Jun 2015, at 19:36, Ben Hines <bhines@xxxxxxxxx> wrote:
> 
> Did you do before/after Ceph performance benchmarks? I dont care if my
> systems are using 80% cpu, if Ceph performance is better than when
> it's using 20% cpu.
> 
> Can you share any scripts you have to automate these things? (NUMA
> pinning, migratepages)
> 
> thanks,
> 
> -Ben
> 
> On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>> There were essentialy three things we had to do for such a drastic drop
>> 
>> 1) recompile CEPH —without-tcmalloc
>> 2) pin the OSDs to a set of a specific NUMA zone  - we had this for a long
>> time and it really helped
>> 3) migrate the OSD memory to the correct CPU with migratepages
>> - we will use cgroups in the future for this, should make life easier and
>> is the only correct solution
>> 
>> It is similiar to the effect of just restarting the OSD, but much better -
>> since we immediately see hundreds of connections on a freshly restarted OSD
>> (and in the benchmark the tcmalloc issue manifested with just two clients in
>> parallel) I’d say we never saw the raw performance with tcmalloc
>> (undegraded), but it was never this good - consistently low latencies, much
>> smaller spikes when something happens and much lower CPU usage (about 50%
>> savings but we’re also backfilling a lot on the background). Workloads are
>> faster as well - like reweighting OSDs on that same node was much (hundreds
>> of percent) faster.
>> 
>> So far the effect has been drastic. I wonder why tcmalloc was even used when
>> people are having problems with it? The glibc malloc seems to work just fine
>> for us.
>> 
>> The only concerning thing is the virtual memory usage - we are over 400GB
>> VSS with a few OSDs. That doesn’t hurt anything, though.
>> 
>> Jan
>> 
>> 
>> On 24 Jun 2015, at 18:46, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>> 
>> Did you see what the effect of just restarting the OSDs before using
>> tcmalloc? I've noticed that there is usually a good drop for us just by
>> restarting them. I don't think it is usually this drastic.
>> 
>> - ----------------
>> Robert LeBlanc
>> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> 
>> On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer  wrote:
>> Can you guess when we did that?
>> Still on dumpling, btw...
>> 
>> http://www.zviratko.net/link/notcmalloc.png
>> 
>> Jan
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>> 
>> wsFcBAEBCAAQBQJVit75CRDmVDuy+mK58QAAmjcP/jU+wyohdwKDP+FHDAgJ
>> DcqdB5aPG2AM79iLcYUub5bQjdNJpcWN/hyZcNdF3aSzEV3aY6jIqu9OpOIB
>> c2fIzfGOoczzW/FEf7qKRVGpxaQL21Sw1LpwMEscNe0ETz9HMHoaAnBO9IFn
>> nUEOCdEpRBO5W1rWwNAx9EVnOUPklb7vVEpY23sgtHhQSprb9oeO8D99AMRz
>> /RhdHKlRDgHBjun/stCiR6lFuvBUx0GBmyaMuO5rfsLGRIkySLv++3CLQI6X
>> NCt/MjYwTTNNfO/y/MjkiV/j+Cm1G1lcjlgbDjilf7bgf8/7W2vJa1sMtaA4
>> xJL+PpZxiKcGSdC96B+EBYxLhLcwsNpbfq7uxQOkIspa66mkIMAVzJgt4DFL
>> Ca+UY3ODA26VtWF5U/hkdupgld+YSxXTyJakeShrBSFAX0a4cygV9Ll7SIhO
>> IDS+0Mbur0IGzIWRgtCQhRXsc7wn3IoIovqe8Nfk4xupeoK2P5UHO1rW9pWy
>> Jwj5PXieDqxgx8RKlulN1bCbSgTaEdveTiqqVxlnM9L0MhgesuB8vkpHbsqn
>> mYJHNzU7ghU89xLnRuia9rBlpjw4OzagfowAJTH3UnaO67kxES+IWO8onQbN
>> RhY0QR5cB5rVSjYkzzlsuLM17fQPcT8++yMarKdsrr6WIGppXUFFdATAqIaY
>> DHD1
>> =goL4
>> -----END PGP SIGNATURE-----
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com