Did you do before/after Ceph performance benchmarks? I dont care if my systems are using 80% cpu, if Ceph performance is better than when it's using 20% cpu. Can you share any scripts you have to automate these things? (NUMA pinning, migratepages) thanks, -Ben On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote: > There were essentialy three things we had to do for such a drastic drop > > 1) recompile CEPH —without-tcmalloc > 2) pin the OSDs to a set of a specific NUMA zone - we had this for a long > time and it really helped > 3) migrate the OSD memory to the correct CPU with migratepages > - we will use cgroups in the future for this, should make life easier and > is the only correct solution > > It is similiar to the effect of just restarting the OSD, but much better - > since we immediately see hundreds of connections on a freshly restarted OSD > (and in the benchmark the tcmalloc issue manifested with just two clients in > parallel) I’d say we never saw the raw performance with tcmalloc > (undegraded), but it was never this good - consistently low latencies, much > smaller spikes when something happens and much lower CPU usage (about 50% > savings but we’re also backfilling a lot on the background). Workloads are > faster as well - like reweighting OSDs on that same node was much (hundreds > of percent) faster. > > So far the effect has been drastic. I wonder why tcmalloc was even used when > people are having problems with it? The glibc malloc seems to work just fine > for us. > > The only concerning thing is the virtual memory usage - we are over 400GB > VSS with a few OSDs. That doesn’t hurt anything, though. > > Jan > > > On 24 Jun 2015, at 18:46, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Did you see what the effect of just restarting the OSDs before using > tcmalloc? I've noticed that there is usually a good drop for us just by > restarting them. I don't think it is usually this drastic. > > - ---------------- > Robert LeBlanc > GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer wrote: > Can you guess when we did that? > Still on dumpling, btw... > > http://www.zviratko.net/link/notcmalloc.png > > Jan > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v0.13.1 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJVit75CRDmVDuy+mK58QAAmjcP/jU+wyohdwKDP+FHDAgJ > DcqdB5aPG2AM79iLcYUub5bQjdNJpcWN/hyZcNdF3aSzEV3aY6jIqu9OpOIB > c2fIzfGOoczzW/FEf7qKRVGpxaQL21Sw1LpwMEscNe0ETz9HMHoaAnBO9IFn > nUEOCdEpRBO5W1rWwNAx9EVnOUPklb7vVEpY23sgtHhQSprb9oeO8D99AMRz > /RhdHKlRDgHBjun/stCiR6lFuvBUx0GBmyaMuO5rfsLGRIkySLv++3CLQI6X > NCt/MjYwTTNNfO/y/MjkiV/j+Cm1G1lcjlgbDjilf7bgf8/7W2vJa1sMtaA4 > xJL+PpZxiKcGSdC96B+EBYxLhLcwsNpbfq7uxQOkIspa66mkIMAVzJgt4DFL > Ca+UY3ODA26VtWF5U/hkdupgld+YSxXTyJakeShrBSFAX0a4cygV9Ll7SIhO > IDS+0Mbur0IGzIWRgtCQhRXsc7wn3IoIovqe8Nfk4xupeoK2P5UHO1rW9pWy > Jwj5PXieDqxgx8RKlulN1bCbSgTaEdveTiqqVxlnM9L0MhgesuB8vkpHbsqn > mYJHNzU7ghU89xLnRuia9rBlpjw4OzagfowAJTH3UnaO67kxES+IWO8onQbN > RhY0QR5cB5rVSjYkzzlsuLM17fQPcT8++yMarKdsrr6WIGppXUFFdATAqIaY > DHD1 > =goL4 > -----END PGP SIGNATURE----- > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com