Re: upgrade from kraken 11.2.0 to 12.2.2 bluestore EC

David Turner <drakonstein@xxxxxxxxx> · Fri, 08 Dec 2017 15:58:04 +0000

Why are you rebooting the node?  You should only need to restart the ceph services.  You need all of your MONs to be running Luminous before any Luminous OSDs will be accepted by the cluster.  So you should update the packages on each server, restart the MONs, then restart your OSDs.  After you restart all of the MONs and have a Luminous quorum of MONs, then you can start restarting OSDs and/or servers.
If you want to, you can start your MGR daemons before doing the OSDs as well, but that step isn't required to have the OSDs come back up.  To get out of this situation, you should update the packages on your remaining MONs and restart the MON service to get all of your MONs running Luminous.  After that, your 24 down OSDs should come back up.

On Fri, Dec 8, 2017 at 10:51 AM nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
Hello Team,
I having a 5 node cluster running with kraken 11.2.0 EC 4+1. 

My plan is to upgrade all 5 nodes to 12.2.2 Luminous without any downtime. I tried on first node, below procedure. 

commented below directive from ceph.conf
enable experimental unrecoverable data corrupting features = bluestore rocksdb

Then start and enabled  ceph-mgr and then hit a reboot. 

## ceph -s
    cluster b2f1b9b9-eecc-4c17-8b92-cfa60b31c121
     health HEALTH_WARN
            2048 pgs degraded
            2048 pgs stuck degraded
            2048 pgs stuck unclean
            2048 pgs stuck undersized
            2048 pgs undersized
            recovery 1091151/1592070 objects degraded (68.537%)
            24/120 in osds are down
     monmap e2: 5 mons at {PL8-CN1=10.50.11.41:6789/0,PL8-CN2=10.50.11.42:6789/0,PL8-CN3=10.50.11.43:6789/0,PL8-CN4=10.50.11.44:6789/0,PL8-CN5=10.50.11.45:6789/0}
            election epoch 18, quorum 0,1,2,3,4 PL8-CN1,PL8-CN2,PL8-CN3,PL8-CN4,PL8-CN5
        mgr active: PL8-CN1
     osdmap e243: 120 osds: 96 up, 120 in; 2048 remapped pgs
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v1099: 2048 pgs, 1 pools, 84304 MB data, 310 kobjects
            105 GB used, 436 TB / 436 TB avail
            1091151/1592070 objects degraded (68.537%)
                2048 active+undersized+degraded
  client io 107 MB/s wr, 0 op/s rd, 860 op/s wr

After reboot I can see that all the 24 OSD's in the first node showing down state. I can see the 24 osd process is running. 

#ps -ef | grep -c  ceph-osd
24

Even If i tried parallely on 5 nodes this procedure  and hit a reboot then it will come successfully without any issues, but for parallel execution time, I would require downtime, which is not accepted by our management at the moment. Please help and share your views. 

I read this https://ceph.com/releases/v12-2-0-luminous-released/  upgrade section. but this didn't help me at the moment. 

Here my question what is the best method to update machine without any downtime? 

Thanks

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com