Why are you rebooting the node? You should only need to restart the ceph services. You need all of your MONs to be running Luminous before any Luminous OSDs will be accepted by the cluster. So you should update the packages on each server, restart the MONs, then restart your OSDs. After you restart all of the MONs and have a Luminous quorum of MONs, then you can start restarting OSDs and/or servers.
If you want to, you can start your MGR daemons before doing the OSDs as well, but that step isn't required to have the OSDs come back up. To get out of this situation, you should update the packages on your remaining MONs and restart the MON service to get all of your MONs running Luminous. After that, your 24 down OSDs should come back up.
On Fri, Dec 8, 2017 at 10:51 AM nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
Hello Team,_______________________________________________I having a 5 node cluster running with kraken 11.2.0 EC 4+1.My plan is to upgrade all 5 nodes to 12.2.2 Luminous without any downtime. I tried on first node, below procedure.commented below directive from ceph.confenable experimental unrecoverable data corrupting features = bluestore rocksdbThen start and enabled ceph-mgr and then hit a reboot.## ceph -scluster b2f1b9b9-eecc-4c17-8b92-cfa60b31c121health HEALTH_WARN2048 pgs degraded2048 pgs stuck degraded2048 pgs stuck unclean2048 pgs stuck undersized2048 pgs undersizedrecovery 1091151/1592070 objects degraded (68.537%)24/120 in osds are downmonmap e2: 5 mons at {PL8-CN1=10.50.11.41:6789/0,PL8-CN2=10.50.11.42:6789/0,PL8-CN3=10.50.11.43:6789/0,PL8-CN4=10.50.11.44:6789/0,PL8-CN5=10.50.11.45:6789/0}election epoch 18, quorum 0,1,2,3,4 PL8-CN1,PL8-CN2,PL8-CN3,PL8-CN4,PL8-CN5mgr active: PL8-CN1osdmap e243: 120 osds: 96 up, 120 in; 2048 remapped pgsflags sortbitwise,require_jewel_osds,require_kraken_osdspgmap v1099: 2048 pgs, 1 pools, 84304 MB data, 310 kobjects105 GB used, 436 TB / 436 TB avail1091151/1592070 objects degraded (68.537%)2048 active+undersized+degradedclient io 107 MB/s wr, 0 op/s rd, 860 op/s wrAfter reboot I can see that all the 24 OSD's in the first node showing down state. I can see the 24 osd process is running.#ps -ef | grep -c ceph-osd24Even If i tried parallely on 5 nodes this procedure and hit a reboot then it will come successfully without any issues, but for parallel execution time, I would require downtime, which is not accepted by our management at the moment. Please help and share your views.I read this https://ceph.com/releases/v12-2-0-luminous-released/ upgrade section. but this didn't help me at the moment.Here my question what is the best method to update machine without any downtime?Thanks
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com