Re: Ceph Hackathon: More Memory Allocator Testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jemalloc 4.0 seems to have some shiny new capabilities, at least.

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309

----- Original Message -----
> From: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>
> To: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
> Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson"
> <mnelson@xxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> Sent: Thursday, August 20, 2015 8:54:59 AM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thank you for that result.
> So it might make sense to know difference between jemalloc and jemalloc 4.0.
> 
>  Shinobu
> 
> ----- Original Message -----
> From: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
> To: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>
> Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy"
> <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel"
> <ceph-devel@xxxxxxxxxxxxxxx>
> Sent: Thursday, August 20, 2015 5:17:46 PM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> memory results of osd daemon under load,
> 
> jemalloc use always more memory than tcmalloc,
> jemalloc 4.0 seem to reduce memory usage but still a little bit more than
> tcmalloc
> 
> 
> 
> osd_op_threads=2 : tcmalloc 2.1
> ------------------------------------------
> root      38066  2.3  0.7 1223088 505144 ?      Ssl  08:35   1:32
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      38165  2.4  0.7 1247828 525356 ?      Ssl  08:35   1:34
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=32: tcmalloc 2.1
> ------------------------------------------
> 
> root      39002  102  0.7 1455928 488584 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      39168  114  0.7 1483752 518368 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=2 jemalloc 3.5
> -----------------------------
> root      18402 72.0  1.1 1642000 769000 ?      Ssl  09:43   0:17
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      18434 89.1  1.2 1677444 797508 ?      Ssl  09:43   0:21
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=32 jemalloc 3.5
> -----------------------------
> root      17204  3.7  1.2 2030616 816520 ?      Ssl  08:35   2:31
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      17228  4.6  1.2 2064928 830060 ?      Ssl  08:35   3:05
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=2 jemalloc 4.0
> -----------------------------
> root      19967  113  1.1 1432520 737988 ?      Ssl  10:04   0:31
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> root      19976 93.6  1.0 1409376 711192 ?      Ssl  10:04   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> 
> 
> osd_op_threads=32 jemalloc 4.0
> -----------------------------
> root      20484  128  1.1 1689176 778508 ?      Ssl  10:06   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      20502  170  1.2 1720524 810668 ?      Ssl  10:06   0:35
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@xxxxxxxxx>
> À: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>
> Cc: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>, "Somnath Roy"
> <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel"
> <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Jeudi 20 Août 2015 07:29:22
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi,
> 
> jemmaloc 4.0 has been released 2 days agos
> 
> https://github.com/jemalloc/jemalloc/releases
> 
> I'm curious to see performance/memory usage improvement :)
> 
> 
> ----- Mail original -----
> De: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>
> À: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>
> Cc: "aderumier" <aderumier@xxxxxxxxx>, "Somnath Roy"
> <Somnath.Roy@xxxxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel"
> <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Jeudi 20 Août 2015 04:00:15
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> How about making any sheet for testing patter?
> 
> Shinobu
> 
> ----- Original Message -----
> From: "Stephen L Blinick" <stephen.l.blinick@xxxxxxxxx>
> To: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>, "Somnath Roy"
> <Somnath.Roy@xxxxxxxxxxx>
> Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel"
> <ceph-devel@xxxxxxxxxxxxxxx>
> Sent: Thursday, August 20, 2015 10:09:36 AM
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Would it make more sense to try this comparison while changing the size of
> the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and
> "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total
> of 10 worker threads).
> 
> Thanks,
> 
> Stephen
> 
> 
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 11:47 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Just have done a small test with jemalloc, change osd_op_threads value, and
> check the memory just after daemon restart.
> 
> osd_op_threads = 2 (default)
> 
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> osd_op_threads = 32
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> 
> 
> I'll try to compare with tcmalloc tommorow and under load.
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>
> À: "aderumier" <aderumier@xxxxxxxxx>
> Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel"
> <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 19 Août 2015 19:29:56
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Yes, it should be 1 per OSD...
> There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to
> the number of threads running..
> But, I don't know if number of threads is a factor for jemalloc..
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@xxxxxxxxx]
> Sent: Wednesday, August 19, 2015 9:55 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> >>I think it is per tcmalloc instance loaded , so, at least with num_osds *
> >>num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> What is num_tcmalloc_instance ? I think 1 osd process use a defined
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?
> 
> I'm saying that, because I have exactly the same bug, client side, with
> librbd + tcmalloc + qemu + iothreads.
> When I defined too much iothread threads, I'm hitting the bug directly. (can
> reproduce 100%).
> Like the thread_cache size is divide by number of threads?
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>
> À: "aderumier" <aderumier@xxxxxxxxx>, "Mark Nelson" <mnelson@xxxxxxxxxx>
> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 19 Août 2015 18:27:30
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> I think it is per tcmalloc instance loaded , so, at least with num_osds *
> num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> Also, I think there is no point of increasing osd_op_threads as it is not in
> IO path anymore..Mark is using default 5:2 for shard:thread per shard..
> 
> But, yes, it could be related to number of threads OSDs are using, need to
> understand how jemalloc works..Also, there may be some tuning to reduce
> memory usage (?).
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 9:06 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> I was listening at the today meeting,
> 
> and seem that the blocker to have jemalloc as default,
> 
> is that it's used more memory by osd (around 300MB?), and some guys could
> have boxes with 60disks.
> 
> 
> I just wonder if the memory increase is related to
> osd_op_num_shards/osd_op_threads value ?
> 
> Seem that as hackaton, the bench has been done on super big cpus boxed
> 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
> with osd_op_threads = 32.
> 
> I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> Maybe jemalloc allocated memory by threads.
> 
> 
> 
> (I think guys with 60disks box, dont use ssd, so low iops by osd, and they
> don't need a lot of threads by osd)
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@xxxxxxxxx>
> À: "Mark Nelson" <mnelson@xxxxxxxxxx>
> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 19 Août 2015 16:01:28
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thanks Marc,
> 
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> jemalloc.
> 
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> 
> 
> What is funny, is that I see exactly same behaviour client librbd side, with
> qemu and multiple iothreads.
> 
> 
> Switching both server and client to jemalloc give me best performance on
> small read currently.
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@xxxxxxxxxx>
> À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi Everyone,
> 
> One of the goals at the Ceph Hackathon last week was to examine how to
> improve Ceph Small IO performance. Jian Zhang presented findings showing a
> dramatic improvement in small random IO performance when Ceph is used with
> jemalloc. His results build upon Sandisk's original findings that the
> default thread cache values are a major bottleneck in TCMalloc 2.1. To
> further verify these results, we sat down at the Hackathon and configured
> the new performance test cluster that Intel generously donated to the Ceph
> community laboratory to run through a variety of tests with different memory
> allocator configurations. I've since written the results of those tests up
> in pdf form for folks who are interested.
> 
> The results are located here:
> 
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
> 
> I want to be clear that many other folks have done the heavy lifting here.
> These results are simply a validation of the many tests that other folks
> have already done. Many thanks to Sandisk and others for figuring this out
> as it's a pretty big deal!
> 
> Side note: Very little tuning other than swapping the memory allocator and a
> couple of quick and dirty ceph tunables were set during these tests. It's
> quite possible that higher IOPS will be achieved as we really start digging
> into the cluster and learning what the bottlenecks are.
> 
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux