Re: Crush Bucket move crashes mons

<warren.jeffs@xxxxxxxxxx> · Tue, 20 Mar 2018 17:22:55 +0000

Hi Paul,

Many thanks for the replies, I actually did (1) and it worked perfectly, I was also able to reproduce this via a test monitor too.

I have updated the bug with all of this info so hopefully no one hits this again.

Many thanks.

Warren

From: Paul Emmerich <paul.emmerich@xxxxxxxx>

Sent: 20 March 2018 17:21

To: Jeffs, Warren (STFC,RAL,ISIS) <warren.jeffs@xxxxxxxxxx>

Cc: ceph-users@xxxxxxxx

Subject: Re: [ceph-users] Crush Bucket move crashes mons

Hi,

I made the changes directly to the crush map, i.e.,

(1) deleting the all the weight_set blocks and then move the bucket via the CLI

or

(2) move the buckets in the crush map and add a new entry to the weight set

Paul

2018-03-16 21:00 GMT+01:00 <warren.jeffs@xxxxxxxxxx>:

Hi Paul,

Many thanks for the super quick replys and analysis on this.

Is it a case of removing the weights from the new hosts and there osds then moving them? After reweighing them correctly?

I already have a bug open, I will get this email chain added to this.

Warren

________________________________

From: Paul Emmerich [paul.emmerich@xxxxxxxx]

Sent: 16 March 2018 16:48

To: Jeffs, Warren (STFC,RAL,ISIS)

Cc: ceph-users@xxxxxxxx

Subject: Re: [ceph-users] Crush Bucket move crashes mons

Hi,

looks like it fails to adjust the number of weight set entries when moving the entries. The good news is that this is 100% reproducible with your crush map:

you should open a bug at http://tracker.ceph.com/ to get this fixed.

Deleting the weight set fixes the problem. Moving the item manually with manual adjustment of the weight set also works in my quick test.

Paul

2018-03-16 16:03 GMT+01:00 <warren.jeffs@xxxxxxxxxx<mailto:warren.jeffs@xxxxxxxxxx>>:

Hi Paul

Many thanks for the reply.

The command is: crush move rack04  room=R80-Upper

Crush map is here: https://pastebin.com/CX7GKtBy

I’ve done some more testing, and the following all work:

•         Moving machines between the racks under the default root.

•         Renaming racks/hosts under the default root

•         Renaming the default root

•         Creating a new root

•         Adding rack05 and rack04 + hosts nina408 and nina508 into the new root

But when trying to move  anything into the default root it fails.

I have tried moving the following into default root:

•         Nina408 – with hosts in and without

•         Nina508 – with hosts in and without

•         Rack04

•         Rack05

•         Rack03 – which I created with nothing in it to try and move.

Since first email, I have got the cluster to HEALTH_OK with reweight mapping drives, so everything cluster wise appears to be functioning fine.

I have not tried manually editing the crush map and reimporting for the risk that it makes the cluster fall over, as this is currently in production. With the CLI I can at least cancel the command the monitor comes back up fine.

Many thanks.

Warren

From: Paul Emmerich [mailto:paul.emmerich@xxxxxxxx<mailto:paul.emmerich@xxxxxxxx>]

Sent: 16 March 2018 13:54

To: Jeffs, Warren (STFC,RAL,ISIS) <warren.jeffs@xxxxxxxxxx<mailto:warren.jeffs@xxxxxxxxxx>>

Cc: ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>

Subject: Re: [ceph-users] Crush Bucket move crashes mons

Hi,

the error looks like there might be something wrong with the device classes (which are managed via separate trees with magic names behind the scenes).

Can you post your crush map and the command that you are trying to run?

Paul

2018-03-15 16:27 GMT+01:00 <warren.jeffs@xxxxxxxxxx<mailto:warren.jeffs@xxxxxxxxxx>>:

Hi All,

Having some interesting challenges.

I am trying to move 2 new nodes + 2 new racks into my default root, I have added them to the cluster outside of the Root=default.

They are all in and up – happy it seems. The new nodes have all 12 OSDs in them and they are all ‘UP’

So when going to move them into the correctly room bucket under the default root they fail.

This is the error log at the time: 
https://pastebin.com/mHfkEp3X

I can create another host in the crush and move that in and out of rack buckets – all while being outside of the default root. Trying to move an empty Rack bucket into the default root fails too.

All of the cluster is on 12.2.4. I do have 2 backfill full osds which is the reason for needing these disks in the cluster asap.

Any thoughts?

Cheers

Warren Jeffs

ISIS Infrastructure Services

STFC Rutherford Appleton Laboratory

e-mail:  warren.jeffs@xxxxxxxxxx<mailto:warren.jeffs@xxxxxxxxxx>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

--

Paul Emmerich

croit GmbH

Freseniusstr. 31h<https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A81247+M%C3%BCnchen&entry=gmail&source=g>

81247 München

www.croit.io<http://www.croit.io>

Tel: +49 89 1896585 90<tel:+49%2089%20189658590>

--

--

Paul Emmerich

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io<http://www.croit.io>

Tel: +49 89 1896585 90

-- 

-- 

Paul Emmerich

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com