Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

"huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx> · Tue, 25 Apr 2023 21:42:07 +0200

Thanks a lot for the valuable input, Wesley, Josh, and Anthony.

It seems the best practice would be upgrade first, and then expand, remove old nodes afterwards.

best regards,

Samuel

huxiaoyu@xxxxxxxxxxxx

From: Wesley Dillingham
Date: 2023-04-25 19:55
To: huxiaoyu@xxxxxxxxxxxx
CC: ceph-users
Subject: Re:  For suggestions and best practices on expanding Ceph cluster and removing old nodes
Get on nautilus first and (perhaps even go to pacific) before expansion. Primarily for the reason that starting  in nautilus degraded data recovery will be prioritized over remapped data recovery. As you phase out old hardware and phase in new hardware you will have a very large amount of backfill happening and if you get into a degraded state in the middle of this backfill it will take a much longer time for the degraded data to become clean again. 

Additionally, you will want to follow the best practice of updating your cluster in order. In short monitors then managers then osds then MDS and RGW then other clients. More details here: https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous

You dont want to run with a mixed software version cluster longer than a well coordinated upgrade takes. 

Respectfully,

Wes Dillingham
wes@xxxxxxxxxxxxxxxxx
LinkedIn

On Tue, Apr 25, 2023 at 12:31 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote:
Dear Ceph folks,

I would like to listen to your advice on the following topic: We have a 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and now will add 10 new nodes. Our plan is to phase out the old 6 nodes, and run RGW Ceph cluster with the new 10 nodes on Nautilus version。 

I can think of two ways to achieve the above goal. The first method would be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and then re-balance;  3) After rebalance completes, remove the 6 old nodes from the cluster

The second method would get rid of the procedure to upgrade the old 6-node from Luminous to Nautilus, because those 6 nodes will be phased out anyway, but then we have to deal with a hybrid cluster with 6-node on Luminous 12.2.12, and 10-node on Nautilus, and after re-balancing, we can remove the 6 old nodes from the cluster.

Any suggestions, advice, or best practice would be highly appreciated.

best regards,

Samuel 

huxiaoyu@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx