>>There was a gentleman on this list before who identified a few >>possible locking issues in the ceph osd deamon. Here is a thread >>original thread. >>http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/19284. He >>performed some really back hacks (like just dropping mutexes) which >>one shouldn't do... but it turned out that he was able to get 3 to 4 >>performance improvement. Yes, I remember of this post, the 4 bottleneck was: 1. fdcache_lock 2. lfn_find in omap_* methods 3. DBObjectMap header 4. fdcache size, slow lookup >>If you're willing to do a lock contention trace (using mutrace, or >>something similar) I'd be really interested in the results of it. The >>results should be especially useful if you're running it against >>MemStore since it'll take away any thing that would prevent these >>bottleneck from showing up (like disk access). I'll build a new ceph test storage soon, so I think I can try to help. But I'm not an expert in process tracing, so help/howto is welcome. ----- Mail original ----- De: "Milosz Tanski" <milosz@xxxxxxxxx> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> Cc: "Andreas Joachim Peters" <Andreas.Joachim.Peters@xxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Vendredi 20 Juin 2014 00:18:28 Objet: Re: CEPH IOPS Baseline Measurements with MemStore Alexandre, There was a gentleman on this list before who identified a few possible locking issues in the ceph osd deamon. Here is a thread original thread. http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/19284. He performed some really back hacks (like just dropping mutexes) which one shouldn't do... but it turned out that he was able to get 3 to 4 performance improvement. If you're willing to do a lock contention trace (using mutrace, or something similar) I'd be really interested in the results of it. The results should be especially useful if you're running it against MemStore since it'll take away any thing that would prevent these bottleneck from showing up (like disk access). Best, - Milosz On Thu, Jun 19, 2014 at 7:08 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: >>>I am not sure if it is actually possible to disable completely all log messages. I did this for benchmarking at compile time changing the logging macro in common/dout.h ==> #define dout_impl(cct, sub, v) .... > > I think it can be done in ceph.conf > https://ceph.com/docs/master/rados/troubleshooting/log-and-debug/#subsystem-log-and-debug-settings > > I remember an old mail from stefan priebe from 2012, reporting also a performance decrease with logging > > https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg09976.html > > with a cpu trace here: > https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg09974/out.pdf > > > ceph.conf to disable them was: > > debug lockdep = 0/0 > debug context = 0/0 > debug crush = 0/0 > debug buffer = 0/0 > debug timer = 0/0 > debug journaler = 0/0 > debug osd = 0/0 > debug optracker = 0/0 > debug objclass = 0/0 > debug filestore = 0/0 > debug journal = 0/0 > debug ms = 0/0 > debug monc = 0/0 > debug tp = 0/0 > debug auth = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug perfcounter = 0/0 > debug asok = 0/0 > debug throttle = 0/0 > > > > ----- Mail original ----- > > De: "Andreas Joachim Peters" <Andreas.Joachim.Peters@xxxxxxx> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Envoyé: Jeudi 19 Juin 2014 11:29:27 > Objet: RE: CEPH IOPS Baseline Measurements with MemStore > > I am not sure if it is actually possible to disable completely all log messages. I did this for benchmarking at compile time changing the logging macro in common/dout.h ==> #define dout_impl(cct, sub, v) .... > > I changed 'osd op threads' but that had no visible impact. > > Cheers Andreas. > > ________________________________________ > From: Alexandre DERUMIER [aderumier@xxxxxxxxx] > Sent: 19 June 2014 11:21 > To: Andreas Joachim Peters > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: CEPH IOPS Baseline Measurements with MemStore > > Hi, > > Thanks for your benchmark ! > >>>If you have some ideas for parameters to tune or see some mistakes in this measurement - let me know! > >>>1) Default Logging has an important impact on the IOPS & latency [0.1-0.2ms] > how do you enable|disable stats ? (ceph.conf) > > >>>2) OSD implementation without journaling does not scale linear with concurrent IOs - need several OSDs to scale IOPS - lock contention/threading model? > It's quite possible, I have see a lot of benchmark with ssd, and osd daemon was always the bottleneck, more osd more scale. > >>>3) a writing OSD never fills more than 4 cores >>>4) a reading OSD never fills more than 5 cores > > maybe "osd op threads" could improve this ? > default is 2 (don't known if with hyperthreading it's going on 4cores instead 2 ?) > > > ----- Mail original ----- > > De: "Andreas Joachim Peters" <Andreas.Joachim.Peters@xxxxxxx> > À: ceph-devel@xxxxxxxxxxxxxxx > Envoyé: Jeudi 19 Juin 2014 11:05:18 > Objet: CEPH IOPS Baseline Measurements with MemStore > > Hi, > > I made some benchmarks/testing using the firefly branch and GCC 4.9. Hardware is 2 CPUs with 6-core Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz with Hyperthreading and 256 GB of memory (kernel 2.6.32-431.17.1.el6.x86_64). > > In my tests I run two OSD configurations on a single box: > > [A] 4 OSDs running with MemStore > [B] 1 OSD running with MemStore > > I use a pool with 'size=1' and read and read/write 1-byte objects all via localhost. > > The local RTT reported by ping is 15 micro seconds, the RTT measured with ZMQ is 100 micro seconds (10 kHZ synchronous 1-byte messages). > RTT measured with another file IO daemon (XRootD) we are using at CERN (31-byte messages) is 9.9 kHZ. > > ------------------------------------------------------------------------------------------------------------------------- > 4 OSDs > ------------------------------------------------------------------------------------------------------------------------- > > {1} [A] > ******* > I measure IOPS with 1 byte objects for separate write and read operations disabling logging of any subsystem: > > Type : IOPS[kHz] : Latency [ms] : ConcurIO [#] > =================================== > Write : 01.7 : 0.50 : 1 > Write: 11.2 : 0.88 : 10 > Write: 11.8 : 1.69 : 10 x 2 [ 2 rados bench processes ] > Write: 11.2 : 3.57 : 10 x 4 [ 4 rados bench processes ] > Read : 02.6 : 0.33 : 1 > Read : 22.4 : 0.43 : 10 > Read : 40.0 : 0.97 : 20 x 2 [ 2 rados bench processes ] > Read : 46.0 : 0.88 : 10 x 4 [ 4 rados bench processes ] > Read : 40.0 : 1.60 : 20 x 4 [ 4 rados bench processes ] > > {2} [A] > ******* > I measure IOPS with the CEPH firefly branch as is (default logging) : > > Type : IOPS[kHz] : Latency [ms] : ConcurIO [#] > =================================== > Write : 01.2 : 0.78 : 1 > Write : 09.1 : 1.00 : 10 > Read : 01.8 : 0.50 : 1 > Read : 14.0 : 1.00 : 10 > Read : 18:0 : 2.00 : 20 x 2 [ 2 rados bench processes ] > Read : 18.0 : 2.20 : 10 x 4 [ 4 rados bench processes ] > > ------------------------------------------------------------------------------------------------------------------------- > 1 OSD > ------------------------------------------------------------------------------------------------------------------------- > > {1} [B] (subsys logging disabled, 1 OSD) > ******* > Write : 02.0 : 0.46 : 1 > Write : 10.0 : 0.95 : 10 > Write : 11.1 : 1.74 : 20 > Write : 12.0 : 1.80 : 10 x 2 [ 2 rados bench processes ] > Write : 10.8 : 3.60 : 10 x 4 [ 4 rados bench processes ] > Read : 03.6 : 0.27 : 1 > Read : 16.9 : 0.50 : 10 > Read : 28.0 : 0.70 : 10 x 2 [ 2 rados bench processes ] > Read : 29.6 : 1.37 : 20 x 2 [ 2 rados bench processes ] > Read : 27.2 : 1.50 : 10 x 4 [ 4 rados bench processes ] > > {2} [B] (defaultlogging, 1 OSD) > ******* > Write : 01.4 : 0.68 : 1 > Write : 04.0 : 2.35 : 10 > Write : 04.0 : 4.69 : 10 x 2 [ 2 rados bench processes ] > > I also played with OSD thread number (no change) and used an in memory filesystem + journaling (filestore backend). Here the{1} [A] result is 1.4 kHz write for 1 IOPS in flight and the peak write performance putting many IOPS in flight and several rados bench processes is 2.3 kHz! > > > Some summarizing remarks: > > 1) Default Logging has an important impact on the IOPS & latency [0.1-0.2ms] > 2) OSD implementation without journaling does not scale linear with concurrent IOs - need several OSDs to scale IOPS - lock contention/threading model? > 3) a writing OSD never fills more than 4 cores > 4) a reading OSD never fills more than 5 cores > 5) running 'rados bench' on a remote machine gives similar or slghltly worse results (upto -20%) > 6) CEPH delivering 20k read IOPS uses 4 cores on server side, while identical operations with higher payload (XRootD) uses one core for 3x higher performance (60k IOPS) > 7) I can scale the other IO daemon (XRootD) to use 10 cores and to deliver 300.000 IOPS on the same box. > > Looking forward to SSDs and volatile memory backend stores I see some improvements to be done in the OSD/communication layer. > > If you have some ideas for parameters to tune or see some mistakes in this measurement - let me know! > > Cheers Andreas. > > > > > > > > > > > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: milosz@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html