Re: Ceph Hackathon: More Memory Allocator Testing

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 19 Aug 2015 07:17:29 -0500

On 08/19/2015 03:07 AM, Haomai Wang wrote:
On Wed, Aug 19, 2015 at 1:36 PM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
Mark,
Thanks for verifying this. Nice report !
Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.

The RSS memory usage in the report is per OSD I guess(really?). It
can't be ignored since it's really a great improvement memory usage.

Do you mean with tcmalloc?  I think it's a tough decision.  For 
jemalloc, 300MB more of RSS per OSD does add up (about 18GB for 60 
OSDs).  On the other hand, the cost of memory is such a small fraction 
of the overall cost of systems like this that it might be worth it to 
switch over anyway.  In the 4K write tests it's pretty clear that even 
with 128MB TC, TCMalloc is suffering and jemalloc appears to still have 
headroom left.  It's possible that bumping the thread cache even higher 
might help TCMalloc close the gap though.  It's also possible that 
jemalloc might have worse memory behavior under recovery scenarios as we 
discussed at the hackathon (And Somnath mentioned above), so I think we 
probably need to run the tests.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson
Sent: Tuesday, August 18, 2015 9:46 PM
To: ceph-devel
Subject: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!

Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html