I was listening at the today meeting, and seem that the blocker to have jemalloc as default, is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx with osd_op_threads = 32. I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. Maybe jemalloc allocated memory by threads. (I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) ----- Mail original ----- De: "aderumier" <aderumier@xxxxxxxxx> À: "Mark Nelson" <mnelson@xxxxxxxxxx> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Mercredi 19 Août 2015 16:01:28 Objet: Re: Ceph Hackathon: More Memory Allocator Testing Thanks Marc, Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. and indeed tcmalloc, even with bigger cache, seem decrease over time. What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. Switching both server and client to jemalloc give me best performance on small read currently. ----- Mail original ----- De: "Mark Nelson" <mnelson@xxxxxxxxxx> À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Mercredi 19 Août 2015 06:45:36 Objet: Ceph Hackathon: More Memory Allocator Testing Hi Everyone, One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. The results are located here: http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. Thanks, Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html