Re: Switching from tcmalloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did you do before/after Ceph performance benchmarks? I dont care if my
systems are using 80% cpu, if Ceph performance is better than when
it's using 20% cpu.

Can you share any scripts you have to automate these things? (NUMA
pinning, migratepages)

thanks,

-Ben

On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> There were essentialy three things we had to do for such a drastic drop
>
> 1) recompile CEPH —without-tcmalloc
> 2) pin the OSDs to a set of a specific NUMA zone  - we had this for a long
> time and it really helped
> 3) migrate the OSD memory to the correct CPU with migratepages
>  - we will use cgroups in the future for this, should make life easier and
> is the only correct solution
>
> It is similiar to the effect of just restarting the OSD, but much better -
> since we immediately see hundreds of connections on a freshly restarted OSD
> (and in the benchmark the tcmalloc issue manifested with just two clients in
> parallel) I’d say we never saw the raw performance with tcmalloc
> (undegraded), but it was never this good - consistently low latencies, much
> smaller spikes when something happens and much lower CPU usage (about 50%
> savings but we’re also backfilling a lot on the background). Workloads are
> faster as well - like reweighting OSDs on that same node was much (hundreds
> of percent) faster.
>
> So far the effect has been drastic. I wonder why tcmalloc was even used when
> people are having problems with it? The glibc malloc seems to work just fine
> for us.
>
> The only concerning thing is the virtual memory usage - we are over 400GB
> VSS with a few OSDs. That doesn’t hurt anything, though.
>
> Jan
>
>
> On 24 Jun 2015, at 18:46, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Did you see what the effect of just restarting the OSDs before using
> tcmalloc? I've noticed that there is usually a good drop for us just by
> restarting them. I don't think it is usually this drastic.
>
> - ----------------
> Robert LeBlanc
> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
> On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer  wrote:
> Can you guess when we did that?
> Still on dumpling, btw...
>
> http://www.zviratko.net/link/notcmalloc.png
>
> Jan
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJVit75CRDmVDuy+mK58QAAmjcP/jU+wyohdwKDP+FHDAgJ
> DcqdB5aPG2AM79iLcYUub5bQjdNJpcWN/hyZcNdF3aSzEV3aY6jIqu9OpOIB
> c2fIzfGOoczzW/FEf7qKRVGpxaQL21Sw1LpwMEscNe0ETz9HMHoaAnBO9IFn
> nUEOCdEpRBO5W1rWwNAx9EVnOUPklb7vVEpY23sgtHhQSprb9oeO8D99AMRz
> /RhdHKlRDgHBjun/stCiR6lFuvBUx0GBmyaMuO5rfsLGRIkySLv++3CLQI6X
> NCt/MjYwTTNNfO/y/MjkiV/j+Cm1G1lcjlgbDjilf7bgf8/7W2vJa1sMtaA4
> xJL+PpZxiKcGSdC96B+EBYxLhLcwsNpbfq7uxQOkIspa66mkIMAVzJgt4DFL
> Ca+UY3ODA26VtWF5U/hkdupgld+YSxXTyJakeShrBSFAX0a4cygV9Ll7SIhO
> IDS+0Mbur0IGzIWRgtCQhRXsc7wn3IoIovqe8Nfk4xupeoK2P5UHO1rW9pWy
> Jwj5PXieDqxgx8RKlulN1bCbSgTaEdveTiqqVxlnM9L0MhgesuB8vkpHbsqn
> mYJHNzU7ghU89xLnRuia9rBlpjw4OzagfowAJTH3UnaO67kxES+IWO8onQbN
> RhY0QR5cB5rVSjYkzzlsuLM17fQPcT8++yMarKdsrr6WIGppXUFFdATAqIaY
> DHD1
> =goL4
> -----END PGP SIGNATURE-----
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux