On Wed, 15 Oct 2014, Shu, Xinxin wrote: > Hi all , recently we tested 4K random write performance on our full SSD > setup (12 x Intel DC3700) , but peak performance is ~23K IOPS, which is > much lower than hardware capability , with detail latency breakdown , we > found that most of latency comes from osd queue , we have noticed the > optimizations on osd queue , and tried latest master on our setup , but > there is a performance regression , we also checked the qlock and pg > lock with perf counter, the waiting count and latency are very small, > the attached pdf shows the details , any suggestion will be appreciated > ? I would start by making sure 'osd enable op tracker = false' if it isn't already. The other thing to keep in mind is that a lot of the work has enabled OSD perforamnce to scale as the clients increase. It looks like your test has a single client. Can you try running 2, 4, 8 clients and see if the per-OSD throughput goes up? Digging into the code with a tool like vtune would be extremely helpful, I think. There is a lot of time spent in do_op (osd prepare and osd queue) that fujitsu has called out but we haven't narrowed down where the time is being spent. sage > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Andreas Bluemle > Sent: Tuesday, October 14, 2014 10:38 PM > To: Sage Weil > Cc: Paul Von-Stamwitz; Stefan Priebe; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params > > Hi Sage, > > [embedded below] > > On Tue, 14 Oct 2014 06:13:58 -0700 (PDT) Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > On Tue, 14 Oct 2014, Andreas Bluemle wrote: > > > Hi, > > > > > > > > > On Wed, 8 Oct 2014 16:55:38 -0700 > > > Paul Von-Stamwitz <PVonStamwitz@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > as mentioned during today's meeting, here are the kernel boot > > > > > > parameters > > > > > which I found to provide the basis for good performance results: > > > > > > > > > > > > processor.max_cstate=0 > > > > > > intel_idle.max_cstate=0 > > > > > > > > > > > > I understand these to basically turn off any power saving > > > > > > modes of the > > > > > CPU; the CPU's we are using are like > > > > > > Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz > > > > > > Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz > > > > > > > > > > > > At the BIOS level, we > > > > > > - turn off Hyperthraeding > > > > > > - turn off Turbo mode (in order ot not leave the > > > > > > specifications) > > > > > > - turn on frequency floor override > > > > > > > > > > > > We also assert that > > > > > > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > > > > > > is set to "performance" > > > > > > > > > > > > Using above we see a constant frequency at the maximum level > > > > > > allowed by > > > > > the CPU (except Turbo mode). > > > > > > > > > > How much performance do we gain by this? Till now i thought it's > > > > > just 1-3% so i'm still running ondemand govenor plus power > > > > > savings. > > > > > > > > As always, it depends. I saw noticeable increases in some > > > > throughput tests (though I can't recall the % gain.) More > > > > important to me was that it made my fio results much more > > > > consistent. As we measure improvements, these settings remove some > > > > of the "system noise". > > > > > > > > Best, > > > > Paul > > > > > > > > > > There were two different aspects which showed improvemnt: > > > - code was executed faster > > > - thread switching delays were reduced significantly > > > > > > See the attached grahics. They show processing of a 4 kB write > > > request: processing at the Pipe::Reader is roughly 200 us in both > > > pictures, and sth. like 20 us at the OSD::Dispatcher. So there is > > > not much of a benefit here. > > > > > > But the delay between the end of the Pipe::Reader and the start of > > > the OSD::Dispatcher threads reduced really significantly. > > > > This test had a single outstanding IO, right? The question for me is > > if this reflect latencies we'd see under a realistic workload, where > > the are more IOs in flight and the CPUs aren't likely to be in low > > power states. I'm not sure how low the load needs to be before those > > states kick in and these latencies start to appear... > > > > sage > > Yes and no... > > Yes: the test was a fio sequential write, 4k per write, with a single IO in flight. > > No: this means that on a given object in the osd file store with the default size of 4 MByte, 1024 subsequent write requests will hit that object - and hence the corresponding ceph-osd daemon. So even though the system as a whole was not very busy, the ceph-osd daemon assigned to the file object under pressure was fairly busy. > > The intention of the test was to eliminate additional latencies because of queues building up. > > What the test shows is the contribution of the various processing steps within ceph-osd to the overall latency for an individual write requres when CPU power state related effects have been eliminated, > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > Andreas Bluemle mailto:Andreas.Bluemle@xxxxxxxxxxx > ITXperts GmbH http://www.itxperts.de > Balanstrasse 73, Geb. 08 Phone: (+49) 89 89044917 > D-81541 Muenchen (Germany) Fax: (+49) 89 89044910 > > Company details: http://www.itxperts.de/imprint.htm > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html