Re: KVM problems when rebalance occurs

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Thu, 7 Jan 2016 08:36:08 -0700

With these min,max settings, we didn't have any problem going to more backfills.
Robert LeBlanc
Sent from a mobile device please excuse any typos.
On Jan 7, 2016 8:30 AM, "nick" <nick@xxxxxxx> wrote:
Heya,

thank you for your answers. We will try to set 16/32 as values for

osd_backfill_scan_[min|max]. I also set the debug logging config. Here is an

excerpt of our new ceph.conf:

"""

[osd]

osd max backfills = 1

osd backfill scan max = 32

osd backfill scan min = 16

osd recovery max active = 1

osd recovery op priority = 1

osd op threads = 8

[global]

debug optracker = 0/0

debug asok = 0/0

debug hadoop = 0/0

debug mds migrator = 0/0

debug objclass = 0/0

debug paxos = 0/0

debug context = 0/0

debug objecter = 0/0

debug mds balancer = 0/0

debug finisher = 0/0

debug auth = 0/0

debug buffer = 0/0

debug lockdep = 0/0

debug mds log = 0/0

debug heartbeatmap = 0/0

debug journaler = 0/0

debug mon = 0/0

debug client = 0/0

debug mds = 0/0

debug throttle = 0/0

debug journal = 0/0

debug crush = 0/0

debug objectcacher = 0/0

debug filer = 0/0

debug perfcounter = 0/0

debug filestore = 0/0

debug rgw = 0/0

debug monc = 0/0

debug rbd = 0/0

debug tp = 0/0

debug osd = 0/0

debug ms = 0/0

debug mds locker = 0/0

debug timer = 0/0

debug mds log expire = 0/0

debug rados = 0/0

debug striper = 0/0

debug rbd replay = 0/0

debug none = 0/0

debug keyvaluestore = 0/0

debug compressor = 0/0

debug crypto = 0/0

debug xio = 0/0

debug civetweb = 0/0

debug newstore = 0/0

"""

I already made a benchmark on our staging setup with the new config and fio, but

did not really get different results than before.

For us it is hardly possible to reproduce the 'stalling' problems on the

staging cluster so I will have to wait and test this in production.

Does anyone know if 'osd max backfills' > 1 could have an impact as well? The

default seems to be 10...

Cheers

Nick

On Wednesday, January 06, 2016 09:17:43 PM Josef Johansson wrote:

> Hi,

>

> Also make sure that you optimize the debug log config. There's a lot on the

> ML on how to set them all to low values (0/0).

>

> Not sure how it's in infernalis but it did a lot in previous versions.

>

> Regards,

> Josef

>

> On 6 Jan 2016 18:16, "Robert LeBlanc" <robert@xxxxxxxxxxxxx> wrote:

> > -----BEGIN PGP SIGNED MESSAGE-----

> > Hash: SHA256

> >

> > There has been a lot of "discussion" about osd_backfill_scan[min,max]

> > lately. My experience with hammer has been opposite that of what

> > people have said before. Increasing those values for us has reduced

> > the load of recovery and has prevented a lot of the disruption seen in

> > our cluster caused by backfilling. It does increase the amount of time

> > to do the recovery (a new node added to the cluster took about 3-4

> > hours before, now takes about 24 hours).

> >

> > We are currently using these values and seem to work well for us.

> > osd_max_backfills = 1

> > osd_backfill_scan_min = 16

> > osd_recovery_max_active = 1

> > osd_backfill_scan_max = 32

> >

> > I would be interested in your results if you try these values.

> > -----BEGIN PGP SIGNATURE-----

> > Version: Mailvelope v1.3.2

> > Comment: https://www.mailvelope.com

> >

> > wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2

> > 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq

> > XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl

> > mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ

> > klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+

> > GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF

> > Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w

> > oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO

> > dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TILYxvCZQ0ThHXQsL4ph

> > bvUr0FQmdV3ukC+Xwa/cePIlVY6JsIQfOlqmrtG7caTZWLvLUDwrwcleb272

> > 243GXlbWCxoI7+StJDHPnY2k7NHLvbN2yG3f5PZvZaBgqqyAP8Fnq6CDtTIE

> > vZ/p+ZcuRw8lqoDgjjdiFyMmhQnFcCtDo3vtIy/UXDw23AVsI5edUyyP/sHt

> > ruPt

> > =X7SH

> > -----END PGP SIGNATURE-----

> > ----------------

> > Robert LeBlanc

> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

> >

> > On Wed, Jan 6, 2016 at 7:13 AM, nick <nick@xxxxxxx> wrote:

> > > Heya,

> > > we are using a ceph cluster (6 Nodes with each having 10x4TB HDD + 2x

> >

> > SSD (for

> >

> > > journal)) in combination with KVM virtualization. All our virtual

> >

> > machine hard

> >

> > > disks are stored on the ceph cluster. The ceph cluster was updated to

> > > the

> > > 'infernalis' release recently.

> > >

> > > We are experiencing problems during cluster maintenance. A normal

> >

> > workflow for

> >

> > > us looks like this:

> > >

> > > - set the noout flag for the cluster

> > > - stop all OSDs on one node

> > > - update the node

> > > - reboot the node

> > > - start all OSDs

> > > - wait for the backfilling to finish

> > > - unset the noout flag

> > >

> > > After we start all OSDs on the node again the cluster backfills and

> >

> > tries to

> >

> > > get all the OSDs in sync. During the beginning of this process we

> >

> > experience

> >

> > > 'stalls' in our running virtual machines. On some the load raises to a

> >

> > very

> >

> > > high value. On others a running webserver responses only with 5xx HTTP

> >

> > codes.

> >

> > > It takes around 5-6 minutes until all is ok again. After those 5-6

> >

> > minutes the

> >

> > > cluster is still backfilling, but the virtual machines behave normal

> >

> > again.

> >

> > > I already set the following parameters in ceph.conf on the nodes to have

> >

> > a

> >

> > > better rebalance traffic/user traffic ratio:

> > >

> > > """

> > > [osd]

> > > osd max backfills = 1

> > > osd backfill scan max = 8

> > > osd backfill scan min = 4

> > > osd recovery max active = 1

> > > osd recovery op priority = 1

> > > osd op threads = 8

> > > """

> > >

> > > It helped a bit, but we are still experiencing the above written

> >

> > problems. It

> >

> > > feels like that for a short time some virtual hard disks are locked. Our

> >

> > ceph

> >

> > > nodes are using bonded 10G network interfaces for the 'OSD network', so

> >

> > I do

> >

> > > not think that network is a bottleneck.

> > >

> > > After reading this blog post:

> > > http://dachary.org/?p=2182

> > > I wonder if there is really a 'read lock' during the object push.

> > >

> > > Does anyone know more about this or do others have the same problems and

> >

> > were

> >

> > > able to fix it?

> > >

> > > Best Regards

> > > Nick

> > >

> > > --

> > > Sebastian Nickel

> > > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich

> > > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch

> > > _______________________________________________

> > > ceph-users mailing list

> > > ceph-users@xxxxxxxxxxxxxx

> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Sebastian Nickel

Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich

Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com