Hello Team,
I having a 5 node cluster running with kraken 11.2.0 EC 4+1.
My plan is to upgrade all 5 nodes to 12.2.2 Luminous without any downtime. I tried on first node, below procedure.
commented below directive from ceph.conf
enable experimental unrecoverable data corrupting features = bluestore rocksdb
Then start and enabled ceph-mgr and then hit a reboot.
## ceph -s
cluster b2f1b9b9-eecc-4c17-8b92-cfa60b31c121
health HEALTH_WARN
2048 pgs degraded
2048 pgs stuck degraded
2048 pgs stuck unclean
2048 pgs stuck undersized
2048 pgs undersized
recovery 1091151/1592070 objects degraded (68.537%)
24/120 in osds are down
monmap e2: 5 mons at {PL8-CN1=10.50.11.41:6789/0,PL8-CN2=10.50.11.42:6789/0,PL8-CN3=10.50.11.43:6789/0,PL8-CN4=10.50.11.44:6789/0,PL8-CN5=10.50.11.45:6789/0}
election epoch 18, quorum 0,1,2,3,4 PL8-CN1,PL8-CN2,PL8-CN3,PL8-CN4,PL8-CN5
mgr active: PL8-CN1
osdmap e243: 120 osds: 96 up, 120 in; 2048 remapped pgs
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v1099: 2048 pgs, 1 pools, 84304 MB data, 310 kobjects
105 GB used, 436 TB / 436 TB avail
1091151/1592070 objects degraded (68.537%)
2048 active+undersized+degraded
client io 107 MB/s wr, 0 op/s rd, 860 op/s wr
After reboot I can see that all the 24 OSD's in the first node showing down state. I can see the 24 osd process is running.
#ps -ef | grep -c ceph-osd
24
Even If i tried parallely on 5 nodes this procedure and hit a reboot then it will come successfully without any issues, but for parallel execution time, I would require downtime, which is not accepted by our management at the moment. Please help and share your views.
I read this https://ceph.com/releases/v12-2-0-luminous-released/ upgrade section. but this didn't help me at the moment.
Here my question what is the best method to update machine without any downtime?
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com