Hi sage , With latest master , we do set 'osd_enable_op_tracker = false' , we tested up to 7 rbd clients in our test , but after two clients, the iops is stable at ~23K, there is no performance gain with more clients -----Original Message----- From: Sage Weil [mailto:sage@xxxxxxxxxxxx] Sent: Wednesday, October 15, 2014 10:23 AM To: Shu, Xinxin Cc: Andreas Bluemle; Paul Von-Stamwitz; Stefan Priebe; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx; Zhang, Jian Subject: RE: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params On Wed, 15 Oct 2014, Shu, Xinxin wrote: > Hi all , recently we tested 4K random write performance on our full > SSD setup (12 x Intel DC3700) , but peak performance is ~23K IOPS, > which is much lower than hardware capability , with detail latency > breakdown , we found that most of latency comes from osd queue , we > have noticed the optimizations on osd queue , and tried latest master > on our setup , but there is a performance regression , we also checked > the qlock and pg lock with perf counter, the waiting count and latency > are very small, the attached pdf shows the details , any suggestion > will be appreciated ? I would start by making sure 'osd enable op tracker = false' if it isn't already. The other thing to keep in mind is that a lot of the work has enabled OSD perforamnce to scale as the clients increase. It looks like your test has a single client. Can you try running 2, 4, 8 clients and see if the per-OSD throughput goes up? Digging into the code with a tool like vtune would be extremely helpful, I think. There is a lot of time spent in do_op (osd prepare and osd queue) that fujitsu has called out but we haven't narrowed down where the time is being spent. sage > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Andreas Bluemle > Sent: Tuesday, October 14, 2014 10:38 PM > To: Sage Weil > Cc: Paul Von-Stamwitz; Stefan Priebe; Somnath Roy; > ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot > params > > Hi Sage, > > [embedded below] > > On Tue, 14 Oct 2014 06:13:58 -0700 (PDT) Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > On Tue, 14 Oct 2014, Andreas Bluemle wrote: > > > Hi, > > > > > > > > > On Wed, 8 Oct 2014 16:55:38 -0700 > > > Paul Von-Stamwitz <PVonStamwitz@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > as mentioned during today's meeting, here are the kernel > > > > > > boot parameters > > > > > which I found to provide the basis for good performance results: > > > > > > > > > > > > processor.max_cstate=0 > > > > > > intel_idle.max_cstate=0 > > > > > > > > > > > > I understand these to basically turn off any power saving > > > > > > modes of the > > > > > CPU; the CPU's we are using are like > > > > > > Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz > > > > > > Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz > > > > > > > > > > > > At the BIOS level, we > > > > > > - turn off Hyperthraeding > > > > > > - turn off Turbo mode (in order ot not leave the > > > > > > specifications) > > > > > > - turn on frequency floor override > > > > > > > > > > > > We also assert that > > > > > > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > > > > > > is set to "performance" > > > > > > > > > > > > Using above we see a constant frequency at the maximum level > > > > > > allowed by > > > > > the CPU (except Turbo mode). > > > > > > > > > > How much performance do we gain by this? Till now i thought > > > > > it's just 1-3% so i'm still running ondemand govenor plus > > > > > power savings. > > > > > > > > As always, it depends. I saw noticeable increases in some > > > > throughput tests (though I can't recall the % gain.) More > > > > important to me was that it made my fio results much more > > > > consistent. As we measure improvements, these settings remove > > > > some of the "system noise". > > > > > > > > Best, > > > > Paul > > > > > > > > > > There were two different aspects which showed improvemnt: > > > - code was executed faster > > > - thread switching delays were reduced significantly > > > > > > See the attached grahics. They show processing of a 4 kB write > > > request: processing at the Pipe::Reader is roughly 200 us in both > > > pictures, and sth. like 20 us at the OSD::Dispatcher. So there is > > > not much of a benefit here. > > > > > > But the delay between the end of the Pipe::Reader and the start of > > > the OSD::Dispatcher threads reduced really significantly. > > > > This test had a single outstanding IO, right? The question for me > > is if this reflect latencies we'd see under a realistic workload, > > where the are more IOs in flight and the CPUs aren't likely to be in > > low power states. I'm not sure how low the load needs to be before > > those states kick in and these latencies start to appear... > > > > sage > > Yes and no... > > Yes: the test was a fio sequential write, 4k per write, with a single IO in flight. > > No: this means that on a given object in the osd file store with the default size of 4 MByte, 1024 subsequent write requests will hit that object - and hence the corresponding ceph-osd daemon. So even though the system as a whole was not very busy, the ceph-osd daemon assigned to the file object under pressure was fairly busy. > > The intention of the test was to eliminate additional latencies because of queues building up. > > What the test shows is the contribution of the various processing > steps within ceph-osd to the overall latency for an individual write > requres when CPU power state related effects have been eliminated, > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > Andreas Bluemle mailto:Andreas.Bluemle@xxxxxxxxxxx > ITXperts GmbH http://www.itxperts.de > Balanstrasse 73, Geb. 08 Phone: (+49) 89 89044917 > D-81541 Muenchen (Germany) Fax: (+49) 89 89044910 > > Company details: http://www.itxperts.de/imprint.htm > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html