RE: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Wed, 15 Oct 2014 02:43:56 +0000

Sage,
I think they seem to be using 7VM (and thus 7 librbd clients) clients for the test.
XinXin,
You are running 2 OSDS/SSD and that is not recommended . Not sure that has an impact or not.  Along with disabling optracker as Sage suggested, you may want to tweak the osd num shards and number of filestore threads to see if it is improving performance.
BTW, each librados client is now ~20% slower (even after rbd_cache = false) and with 7 clients adding those degradation could be significant. One quick check you can do to factor out librbd degradation, is to use firefly librbd/librados combination.

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
Sent: Tuesday, October 14, 2014 7:23 PM
To: Shu, Xinxin
Cc: Andreas Bluemle; Paul Von-Stamwitz; Stefan Priebe; Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx; Zhang, Jian
Subject: RE: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot params

On Wed, 15 Oct 2014, Shu, Xinxin wrote:
> Hi all , recently we tested 4K random write performance on our full
> SSD setup (12 x Intel DC3700) , but peak performance is ~23K IOPS,
> which is much lower than hardware capability , with detail latency
> breakdown , we found that most of latency comes from osd queue , we
> have noticed the optimizations on osd queue , and tried latest master
> on our setup , but there is a performance regression , we also checked
> the qlock and pg lock with perf counter, the waiting count and latency
> are very small, the attached pdf shows the details , any suggestion
> will be appreciated ?

I would start by making sure 'osd enable op tracker = false' if it isn't already.

The other thing to keep in mind is that a lot of the work has enabled OSD perforamnce to scale as the clients increase.  It looks like your test has a single client.  Can you try running 2, 4, 8 clients and see if the per-OSD throughput goes up?

Digging into the code with a tool like vtune would be extremely helpful, I think.  There is a lot of time spent in do_op (osd prepare and osd queue) that fujitsu has called out but we haven't narrowed down where the time is being spent.

sage

>
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Andreas Bluemle
> Sent: Tuesday, October 14, 2014 10:38 PM
> To: Sage Weil
> Cc: Paul Von-Stamwitz; Stefan Priebe; Somnath Roy;
> ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: 10/7/2014 Weekly Ceph Performance Meeting: kernel boot
> params
>
> Hi Sage,
>
> [embedded below]
>
> On Tue, 14 Oct 2014 06:13:58 -0700 (PDT) Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> > On Tue, 14 Oct 2014, Andreas Bluemle wrote:
> > > Hi,
> > >
> > >
> > > On Wed, 8 Oct 2014 16:55:38 -0700
> > > Paul Von-Stamwitz <PVonStamwitz@xxxxxxxxxxxxxx> wrote:
> > >
> > > >
> > > > > > Hi,
> > > > > >
> > > > > > as mentioned during today's meeting, here are the kernel
> > > > > > boot parameters
> > > > > which I found to provide the basis for good performance results:
> > > > > >
> > > > > >    processor.max_cstate=0
> > > > > >    intel_idle.max_cstate=0
> > > > > >
> > > > > > I understand these to basically turn off any power saving
> > > > > > modes of the
> > > > > CPU; the CPU's we are using are like
> > > > > >   Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
> > > > > >   Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
> > > > > >
> > > > > > At the BIOS level, we
> > > > > >   - turn off Hyperthraeding
> > > > > >   - turn off Turbo mode (in order ot not leave the
> > > > > > specifications)
> > > > > >   - turn on frequency floor override
> > > > > >
> > > > > > We also assert that
> > > > > >   /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> > > > > >   is set to "performance"
> > > > > >
> > > > > > Using above we see a constant frequency at the maximum level
> > > > > > allowed by
> > > > > the CPU (except Turbo mode).
> > > > >
> > > > > How much performance do we gain by this? Till now i thought
> > > > > it's just 1-3% so i'm still running ondemand govenor plus
> > > > > power savings.
> > > >
> > > > As always, it depends. I saw noticeable increases in some
> > > > throughput tests (though I can't recall the % gain.) More
> > > > important to me was that it made my fio results much more
> > > > consistent. As we measure improvements, these settings remove
> > > > some of the "system noise".
> > > >
> > > > Best,
> > > > Paul
> > > >
> > >
> > > There were two different aspects which showed improvemnt:
> > >  - code was executed faster
> > >  - thread switching delays were reduced significantly
> > >
> > > See the attached grahics. They show processing of a 4 kB write
> > > request: processing at the Pipe::Reader is roughly 200 us in both
> > > pictures, and sth. like 20 us at the OSD::Dispatcher. So there is
> > > not much of a benefit here.
> > >
> > > But the delay between the end of the Pipe::Reader and the start of
> > > the OSD::Dispatcher threads reduced really significantly.
> >
> > This test had a single outstanding IO, right?  The question for me
> > is if this reflect latencies we'd see under a realistic workload,
> > where the are more IOs in flight and the CPUs aren't likely to be in
> > low power states. I'm not sure how low the load needs to be before
> > those states kick in and these latencies start to appear...
> >
> > sage
>
> Yes and no...
>
> Yes: the test was a fio sequential write, 4k per write, with a single IO in flight.
>
> No: this means that on a given object in the osd file store with the default size of 4 MByte, 1024 subsequent write requests will hit that object - and hence the corresponding ceph-osd daemon. So even though the system as a whole was not very busy, the ceph-osd daemon assigned to the file object under pressure was fairly busy.
>
> The intention of the test was to eliminate additional latencies because of queues building up.
>
> What the test shows is the contribution of the various processing
> steps within ceph-osd to the overall latency for an individual write
> requres when CPU power state related effects have been eliminated,
>
>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >
>
>
>
> --
> Andreas Bluemle                     mailto:Andreas.Bluemle@xxxxxxxxxxx
> ITXperts GmbH                       http://www.itxperts.de
> Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
> D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910
>
> Company details: http://www.itxperts.de/imprint.htm
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> info at  http://vger.kernel.org/majordomo-info.html
>

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html