Re: double rebalance when removing osd

Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> · Mon, 11 Jan 2016 15:26:27 +0000

Rafael,

Yes, the cluster still rebalances twice when removing a failed osd. An osd that is marked out for any reason but still exists in the crush map gets its placement groups remapped to different osds until it comes back in, at which point those pgs are remapped back. When an osd is removed from the crush map, its pgs get mapped to new osds permanently. The mappings may be completely different for these two cases, which is why you get double rebalancing even when those two operations happen without the osd coming back in in between.

In the case of a failed osd, I usually don't worry about it and just follow the documented steps because I'm marking an osd out and then removing it from the crush map immediately, so the first rebalance does almost nothing by the time the second overrides it, which matches what you were told by support. If this is a problem for you or if you're removing an osd that's still functional to some degree, then reweighting to 0, waiting for the single rebalance, then following the removal steps is probably your best bet.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete it, together with any attachments.

-----Original Message-----
From: Andy Allan [mailto:gravitystorm@xxxxxxxxx] 
Sent: Monday, January 11, 2016 4:09 AM
To: Rafael Lopez <rafael.lopez@xxxxxxxxxx>
Cc: Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  double rebalance when removing osd

On 11 January 2016 at 02:10, Rafael Lopez <rafael.lopez@xxxxxxxxxx> wrote:

> @Steve, even when you remove due to failing, have you noticed that the cluster rebalances twice using the documented steps? You may not if you don't wait for the initial recovery after 'ceph osd out'. If you do 'ceph osd out' and immediately 'ceph osd crush remove', RH support has told me that this effectively 'cancels' the original move triggered from 'ceph osd out' and starts permanently remapping... which still doesn't really explain why we have to do the ceph osd out in the first place..

This topic was last discussed in December - the documentation for removing an OSD from the cluster is not helpful. Unfortunately it doesn't look like anyone is going to fix the documentation.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627

Basically, when you want to remove an OSD, there's an alternative sequence of commands that avoids the double-rebalance.

The better approach is to reweight the OSD to zero first, then wait for the (one and only) rebalance, then mark out and remove. Here's more details from the previous thread:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629

Thanks,
Andy
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com