Jemalloc 4.0 seems to have some shiny new capabilities, at least. Matt -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ----- Original Message ----- > From: "Shinobu Kinjo" <skinjo@xxxxxxxxxx> > To: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" > <mnelson@xxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Thursday, August 20, 2015 8:54:59 AM > Subject: Re: Ceph Hackathon: More Memory Allocator Testing > > Thank you for that result. > So it might make sense to know difference between jemalloc and jemalloc 4.0. > > Shinobu > > ----- Original Message ----- > From: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > To: "Shinobu Kinjo" <skinjo@xxxxxxxxxx> > Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy" > <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" > <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Thursday, August 20, 2015 5:17:46 PM > Subject: Re: Ceph Hackathon: More Memory Allocator Testing > > memory results of osd daemon under load, > > jemalloc use always more memory than tcmalloc, > jemalloc 4.0 seem to reduce memory usage but still a little bit more than > tcmalloc > > > > osd_op_threads=2 : tcmalloc 2.1 > ------------------------------------------ > root 38066 2.3 0.7 1223088 505144 ? Ssl 08:35 1:32 > /usr/bin/ceph-osd --cluster=ceph -i 4 -f > root 38165 2.4 0.7 1247828 525356 ? Ssl 08:35 1:34 > /usr/bin/ceph-osd --cluster=ceph -i 5 -f > > > osd_op_threads=32: tcmalloc 2.1 > ------------------------------------------ > > root 39002 102 0.7 1455928 488584 ? Ssl 09:41 0:30 > /usr/bin/ceph-osd --cluster=ceph -i 4 -f > root 39168 114 0.7 1483752 518368 ? Ssl 09:41 0:30 > /usr/bin/ceph-osd --cluster=ceph -i 5 -f > > > osd_op_threads=2 jemalloc 3.5 > ----------------------------- > root 18402 72.0 1.1 1642000 769000 ? Ssl 09:43 0:17 > /usr/bin/ceph-osd --cluster=ceph -i 0 -f > root 18434 89.1 1.2 1677444 797508 ? Ssl 09:43 0:21 > /usr/bin/ceph-osd --cluster=ceph -i 1 -f > > > osd_op_threads=32 jemalloc 3.5 > ----------------------------- > root 17204 3.7 1.2 2030616 816520 ? Ssl 08:35 2:31 > /usr/bin/ceph-osd --cluster=ceph -i 0 -f > root 17228 4.6 1.2 2064928 830060 ? Ssl 08:35 3:05 > /usr/bin/ceph-osd --cluster=ceph -i 1 -f > > > osd_op_threads=2 jemalloc 4.0 > ----------------------------- > root 19967 113 1.1 1432520 737988 ? Ssl 10:04 0:31 > /usr/bin/ceph-osd --cluster=ceph -i 1 -f > root 19976 93.6 1.0 1409376 711192 ? Ssl 10:04 0:26 > /usr/bin/ceph-osd --cluster=ceph -i 0 -f > > > osd_op_threads=32 jemalloc 4.0 > ----------------------------- > root 20484 128 1.1 1689176 778508 ? Ssl 10:06 0:26 > /usr/bin/ceph-osd --cluster=ceph -i 0 -f > root 20502 170 1.2 1720524 810668 ? Ssl 10:06 0:35 > /usr/bin/ceph-osd --cluster=ceph -i 1 -f > > > > ----- Mail original ----- > De: "aderumier" <aderumier@xxxxxxxxx> > À: "Shinobu Kinjo" <skinjo@xxxxxxxxxx> > Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy" > <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" > <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Jeudi 20 Août 2015 07:29:22 > Objet: Re: Ceph Hackathon: More Memory Allocator Testing > > Hi, > > jemmaloc 4.0 has been released 2 days agos > > https://github.com/jemalloc/jemalloc/releases > > I'm curious to see performance/memory usage improvement :) > > > ----- Mail original ----- > De: "Shinobu Kinjo" <skinjo@xxxxxxxxxx> > À: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx> > Cc: "aderumier" <aderumier@xxxxxxxxx>, "Somnath Roy" > <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" > <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Jeudi 20 Août 2015 04:00:15 > Objet: Re: Ceph Hackathon: More Memory Allocator Testing > > How about making any sheet for testing patter? > > Shinobu > > ----- Original Message ----- > From: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx> > To: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>, "Somnath Roy" > <Somnath.Roy@xxxxxxxxxxx> > Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" > <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Thursday, August 20, 2015 10:09:36 AM > Subject: RE: Ceph Hackathon: More Memory Allocator Testing > > Would it make more sense to try this comparison while changing the size of > the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and > "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total > of 10 worker threads). > > Thanks, > > Stephen > > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER > Sent: Wednesday, August 19, 2015 11:47 AM > To: Somnath Roy > Cc: Mark Nelson; ceph-devel > Subject: Re: Ceph Hackathon: More Memory Allocator Testing > > Just have done a small test with jemalloc, change osd_op_threads value, and > check the memory just after daemon restart. > > osd_op_threads = 2 (default) > > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd > --cluster=ceph -i 0 -f > > osd_op_threads = 32 > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd > --cluster=ceph -i 0 -f > > > > I'll try to compare with tcmalloc tommorow and under load. > > > > ----- Mail original ----- > De: "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> > À: "aderumier" <aderumier@xxxxxxxxx> > Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" > <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 19 Août 2015 19:29:56 > Objet: RE: Ceph Hackathon: More Memory Allocator Testing > > Yes, it should be 1 per OSD... > There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to > the number of threads running.. > But, I don't know if number of threads is a factor for jemalloc.. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Alexandre DERUMIER [mailto:aderumier@xxxxxxxxx] > Sent: Wednesday, August 19, 2015 9:55 AM > To: Somnath Roy > Cc: Mark Nelson; ceph-devel > Subject: Re: Ceph Hackathon: More Memory Allocator Testing > > << I think that tcmalloc have a fixed size > (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. > > >>I think it is per tcmalloc instance loaded , so, at least with num_osds * > >>num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. > > What is num_tcmalloc_instance ? I think 1 osd process use a defined > TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? > > I'm saying that, because I have exactly the same bug, client side, with > librbd + tcmalloc + qemu + iothreads. > When I defined too much iothread threads, I'm hitting the bug directly. (can > reproduce 100%). > Like the thread_cache size is divide by number of threads? > > > > > > > ----- Mail original ----- > De: "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> > À: "aderumier" <aderumier@xxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 19 Août 2015 18:27:30 > Objet: RE: Ceph Hackathon: More Memory Allocator Testing > > << I think that tcmalloc have a fixed size > (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. > > I think it is per tcmalloc instance loaded , so, at least with num_osds * > num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. > > Also, I think there is no point of increasing osd_op_threads as it is not in > IO path anymore..Mark is using default 5:2 for shard:thread per shard.. > > But, yes, it could be related to number of threads OSDs are using, need to > understand how jemalloc works..Also, there may be some tuning to reduce > memory usage (?). > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER > Sent: Wednesday, August 19, 2015 9:06 AM > To: Mark Nelson > Cc: ceph-devel > Subject: Re: Ceph Hackathon: More Memory Allocator Testing > > I was listening at the today meeting, > > and seem that the blocker to have jemalloc as default, > > is that it's used more memory by osd (around 300MB?), and some guys could > have boxes with 60disks. > > > I just wonder if the memory increase is related to > osd_op_num_shards/osd_op_threads value ? > > Seem that as hackaton, the bench has been done on super big cpus boxed > 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx > with osd_op_threads = 32. > > I think that tcmalloc have a fixed size > (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. > > Maybe jemalloc allocated memory by threads. > > > > (I think guys with 60disks box, dont use ssd, so low iops by osd, and they > don't need a lot of threads by osd) > > > > ----- Mail original ----- > De: "aderumier" <aderumier@xxxxxxxxx> > À: "Mark Nelson" <mnelson@xxxxxxxxxx> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 19 Août 2015 16:01:28 > Objet: Re: Ceph Hackathon: More Memory Allocator Testing > > Thanks Marc, > > Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs > jemalloc. > > and indeed tcmalloc, even with bigger cache, seem decrease over time. > > > What is funny, is that I see exactly same behaviour client librbd side, with > qemu and multiple iothreads. > > > Switching both server and client to jemalloc give me best performance on > small read currently. > > > > > > > ----- Mail original ----- > De: "Mark Nelson" <mnelson@xxxxxxxxxx> > À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 19 Août 2015 06:45:36 > Objet: Ceph Hackathon: More Memory Allocator Testing > > Hi Everyone, > > One of the goals at the Ceph Hackathon last week was to examine how to > improve Ceph Small IO performance. Jian Zhang presented findings showing a > dramatic improvement in small random IO performance when Ceph is used with > jemalloc. His results build upon Sandisk's original findings that the > default thread cache values are a major bottleneck in TCMalloc 2.1. To > further verify these results, we sat down at the Hackathon and configured > the new performance test cluster that Intel generously donated to the Ceph > community laboratory to run through a variety of tests with different memory > allocator configurations. I've since written the results of those tests up > in pdf form for folks who are interested. > > The results are located here: > > http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf > > I want to be clear that many other folks have done the heavy lifting here. > These results are simply a validation of the many tests that other folks > have already done. Many thanks to Sandisk and others for figuring this out > as it's a pretty big deal! > > Side note: Very little tuning other than swapping the memory allocator and a > couple of quick and dirty ceph tunables were set during these tests. It's > quite possible that higher IOPS will be achieved as we really start digging > into the cluster and learning what the bottlenecks are. > > Thanks, > Mark > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html > N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html