What if: Upgrade procedure mistake by restarting OSD before MON?

Mark Kirkwood <markkirkwood@xxxxxxxxxxxxxxxx> · Wed, 1 Dec 2021 12:41:55 +1300

Hi,

I am planning a Luminous to Nautilus upgrade. The instructions state 
(very terse version):

- install Nautilus ceph packages

- restart MONs

- restart MGRs

- restart OSDs

We have OSDs running on our MON hosts (essentially all our ceph hosts 
are the same chassis). So, if everything goes properly we simply restart 
the MONs on the hosts with them after adding the Nautlus packages and 
then go back and restart the OSDs. Where is the problem?

I'm wondering about the situation where you are part way through 
restarting your MONs (or MGRs) and one of the hosts reboots (or perhaps 
a single OSD on one of the MON hosts crashes and is restarted). I.e you 
now has one (or more) OSDs running Nautilus before you've finished 
restarting all the MONs. I've tested this briefly and it looks like the 
OSD rejoins the cluster, but space utilization for it goes crazy thereafter.

So my question is: if this happens, what is the recommended remedial 
action? Is destroying the impacted OSDs the only option?

regards

Mark

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx