Re: OSD upgrades

Thomas Byrne - UKRI STFC <tom.byrne@xxxxxxxxxx> · Tue, 2 Jun 2020 10:31:29 +0000

As you have noted, 'ceph osd reweight 0' is the same as an 'ceph osd out', but it is not the same as removing the OSD from the crush map (or setting crush weight to 0). This explains your observation of the double rebalance when you mark an OSD out (or reweight an OSD to 0), and then remove it later.

To avoid this, I use a crush reweight for the initial step to move PGs off an OSD when draining nodes. You can then purge the OSD with no further PG movement.

Double movement:
> ceph osd out $i
# rebalancing
> ceph osd purge $i
# more rebalancing

Single movement:
> ceph osd crush reweight $i 0
# rebalancing
> ceph osd purge $i
# no rebalancing

The reason this occurs (as I understand it) is that the reweight value is taken into account later in the crush calc, so an OSD with a reweight of 0 can still be picked for a PG set, and then the reweight kicks in and forces the calc to be retried, giving a different value for the PG set compared to if the OSD was not present, or had a crush weight of 0.

Cheers,
Tom

> -----Original Message-----
> From: Brent Kennedy <bkennedy@xxxxxxxxxx>
> Sent: 02 June 2020 04:44
> To: 'ceph-users' <ceph-users@xxxxxxx>
> Subject:  OSD upgrades
> 
> We are rebuilding servers and before luminous our process was:
> 
> 
> 
> 1.       Reweight the OSD to 0
> 
> 2.       Wait for rebalance to complete
> 
> 3.       Out the osd
> 
> 4.       Crush remove osd
> 
> 5.       Auth del osd
> 
> 6.       Ceph osd rm #
> 
> 
> 
> Seems the luminous documentation says that you should:
> 
> 1.       Out the osd
> 
> 2.       Wait for the cluster rebalance to finish
> 
> 3.       Stop the osd
> 
> 4.       Osd purge #
> 
> 
> 
> Is reweighting to 0 no longer suggested?
> 
> 
> 
> Side note:  I tried our existing process and even after reweight, the entire
> cluster restarted the balance again after step 4 ( crush remove osd ) of the old
> process.  I should also note, by reweighting to 0, when I tried to run "ceph osd
> out #", it said it was already marked out.
> 
> 
> 
> I assume the docs are correct, but just want to make sure since reweighting
> had been previously recommended.
> 
> 
> 
> Regards,
> 
> -Brent
> 
> 
> 
> Existing Clusters:
> 
> Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
> gateways ( all virtual on nvme )
> 
> US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways,
> 2 iscsi gateways
> 
> UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways
> 
> US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways,
> 2 iscsi gateways
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to
> ceph-users-leave@xxxxxxx

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx