Re: Ceph Hackathon: More Memory Allocator Testing

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Wed, 19 Aug 2015 18:05:51 +0200 (CEST)

I was listening at the today meeting,

and seem that the blocker to have jemalloc as default,

is that it's used more memory by osd (around 300MB?),
and some guys could have boxes with 60disks.

I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ?

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T,
http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32.

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.

Maybe jemalloc allocated memory by threads.

(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd)

----- Mail original -----
De: "aderumier" <aderumier@xxxxxxxxx>
À: "Mark Nelson" <mnelson@xxxxxxxxxx>
Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 

What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 

Switching both server and client to jemalloc give me best performance on small read currently. 

----- Mail original ----- 
De: "Mark Nelson" <mnelson@xxxxxxxxxx> 
À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to 
improve Ceph Small IO performance. Jian Zhang presented findings 
showing a dramatic improvement in small random IO performance when Ceph 
is used with jemalloc. His results build upon Sandisk's original 
findings that the default thread cache values are a major bottleneck in 
TCMalloc 2.1. To further verify these results, we sat down at the 
Hackathon and configured the new performance test cluster that Intel 
generously donated to the Ceph community laboratory to run through a 
variety of tests with different memory allocator configurations. I've 
since written the results of those tests up in pdf form for folks who 
are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting 
here. These results are simply a validation of the many tests that 
other folks have already done. Many thanks to Sandisk and others for 
figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator 
and a couple of quick and dirty ceph tunables were set during these 
tests. It's quite possible that higher IOPS will be achieved as we 
really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@xxxxxxxxxxxxxxx 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@xxxxxxxxxxxxxxx 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html