RE: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 14 Oct 2014 19:23:19 -0700 (PDT)

On Wed, 15 Oct 2014, Shu, Xinxin wrote:
> Hi all , recently we tested 4K random write performance on our full SSD 
> setup (12 x Intel DC3700) , but peak performance is ~23K IOPS, which is 
> much lower than hardware capability , with detail latency breakdown , we 
> found that most of latency comes from osd queue , we have noticed the 
> optimizations on osd queue , and tried latest master on our setup , but 
> there is a performance regression , we also checked the qlock and pg 
> lock with perf counter, the waiting count and latency are very small, 
> the attached pdf shows the details , any suggestion will be appreciated 
> ?

I would start by making sure 'osd enable op tracker = false' if it isn't 
already.

The other thing to keep in mind is that a lot of the work has enabled 
OSD perforamnce to scale as the clients increase.  It looks like 
your test has a single client.  Can you try running 2, 4, 8 clients 
and see if the per-OSD throughput goes up?

Digging into the code with a tool like vtune would be extremely helpful, I 
think.  There is a lot of time spent in do_op (osd prepare and osd queue) 
that fujitsu has called out but we haven't narrowed down where the time is 
being spent.

sage

> 
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Andreas Bluemle
> Sent: Tuesday, October 14, 2014 10:38 PM
> To: Sage Weil
> Cc: Paul Von-Stamwitz; Stefan Priebe; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params
> 
> Hi Sage,
> 
> [embedded below]
> 
> On Tue, 14 Oct 2014 06:13:58 -0700 (PDT) Sage Weil <sage@xxxxxxxxxxxx> wrote:
> 
> > On Tue, 14 Oct 2014, Andreas Bluemle wrote:
> > > Hi,
> > > 
> > > 
> > > On Wed, 8 Oct 2014 16:55:38 -0700
> > > Paul Von-Stamwitz <PVonStamwitz@xxxxxxxxxxxxxx> wrote:
> > > 
> > > >  
> > > > > > Hi,
> > > > > >
> > > > > > as mentioned during today's meeting, here are the kernel boot 
> > > > > > parameters
> > > > > which I found to provide the basis for good performance results:
> > > > > >
> > > > > >    processor.max_cstate=0
> > > > > >    intel_idle.max_cstate=0
> > > > > >
> > > > > > I understand these to basically turn off any power saving 
> > > > > > modes of the
> > > > > CPU; the CPU's we are using are like
> > > > > >   Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
> > > > > >   Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
> > > > > >
> > > > > > At the BIOS level, we
> > > > > >   - turn off Hyperthraeding
> > > > > >   - turn off Turbo mode (in order ot not leave the
> > > > > > specifications)
> > > > > >   - turn on frequency floor override
> > > > > >
> > > > > > We also assert that
> > > > > >   /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> > > > > >   is set to "performance"
> > > > > >
> > > > > > Using above we see a constant frequency at the maximum level 
> > > > > > allowed by
> > > > > the CPU (except Turbo mode).
> > > > > 
> > > > > How much performance do we gain by this? Till now i thought it's 
> > > > > just 1-3% so i'm still running ondemand govenor plus power 
> > > > > savings.
> > > > 
> > > > As always, it depends. I saw noticeable increases in some 
> > > > throughput tests (though I can't recall the % gain.) More 
> > > > important to me was that it made my fio results much more 
> > > > consistent. As we measure improvements, these settings remove some 
> > > > of the "system noise".
> > > > 
> > > > Best,
> > > > Paul
> > > > 
> > > 
> > > There were two different aspects which showed improvemnt:
> > >  - code was executed faster
> > >  - thread switching delays were reduced significantly
> > > 
> > > See the attached grahics. They show processing of a 4 kB write
> > > request: processing at the Pipe::Reader is roughly 200 us in both 
> > > pictures, and sth. like 20 us at the OSD::Dispatcher. So there is 
> > > not much of a benefit here.
> > > 
> > > But the delay between the end of the Pipe::Reader and the start of 
> > > the OSD::Dispatcher threads reduced really significantly.
> > 
> > This test had a single outstanding IO, right?  The question for me is 
> > if this reflect latencies we'd see under a realistic workload, where 
> > the are more IOs in flight and the CPUs aren't likely to be in low 
> > power states. I'm not sure how low the load needs to be before those 
> > states kick in and these latencies start to appear...
> > 
> > sage
> 
> Yes and no...
> 
> Yes: the test was a fio sequential write, 4k per write, with a single IO in flight.
> 
> No: this means that on a given object in the osd file store with the default size of 4 MByte, 1024 subsequent write requests will hit that object - and hence the corresponding ceph-osd daemon. So even though the system as a whole was not very busy, the ceph-osd daemon assigned to the file object under pressure was fairly busy.
> 
> The intention of the test was to eliminate additional latencies because of queues building up.
> 
> What the test shows is the contribution of the various processing steps within ceph-osd to the overall latency for an individual write requres when CPU power state related effects have been eliminated,
> 
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> 
> -- 
> Andreas Bluemle                     mailto:Andreas.Bluemle@xxxxxxxxxxx
> ITXperts GmbH                       http://www.itxperts.de
> Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
> D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910
> 
> Company details: http://www.itxperts.de/imprint.htm
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html