On Fri, Sep 15, 2017 at 1:48 PM, David <dclistslinux@xxxxxxxxx> wrote: > Happy to report I got everything up to Luminous, used your tip to keep the > OSDs running, David, thanks again for that. > > I'd say this is a potential gotcha for people collocating MONs. It appears > that if you're running selinux, even in permissive mode, upgrading the > ceph-selinux packages forces a restart on all the OSDs. It is the ceph-osd and/or ceph-mon that got upgraded and has restarted the service. This is the *correct* behavior, If it is not restarting after upgrade then its a bug. You're left with a > load of OSDs down that you can't start as you don't have a Luminous mon > quorum yet. Do you have only one monitor? It is recommended to have more 3 or odd number of mons(>3) for HA, The release notes also mentions upgrading mon's one by one, As long as you have redundancy mon/osd collocation shouldn't matter much. > > > On 15 Sep 2017 4:54 p.m., "David" <dclistslinux@xxxxxxxxx> wrote: > > Hi David > > I like your thinking! Thanks for the suggestion. I've got a maintenance > window later to finish the update so will give it a try. > > > On Thu, Sep 14, 2017 at 6:24 PM, David Turner <drakonstein@xxxxxxxxx> wrote: >> >> This isn't a great solution, but something you could try. If you stop all >> of the daemons via systemd and start them all in a screen as a manually >> running daemon in the foreground of each screen... I don't think that yum >> updating the packages can stop or start the daemons. You could copy and >> paste the running command (viewable in ps) to know exactly what to run in >> the screens to start the daemons like this. >> >> On Wed, Sep 13, 2017 at 6:53 PM David <dclistslinux@xxxxxxxxx> wrote: >>> >>> Hi All >>> >>> I did a Jewel -> Luminous upgrade on my dev cluster and it went very >>> smoothly. >>> >>> I've attempted to upgrade on a small production cluster but I've hit a >>> snag. >>> >>> After installing the ceph 12.2.0 packages with "yum install ceph" on the >>> first node and accepting all the dependencies, I found that all the OSD >>> daemons, the MON and the MDS running on that node were terminated. Systemd >>> appears to have attempted to restart them all but the daemons didn't start >>> successfully (not surprising as first stage of upgrading all mons in cluster >>> not completed). I was able to start the MON and it's running. The OSDs are >>> all down and I'm reluctant to attempt to start them without upgrading the >>> other MONs in the cluster. I'm also reluctant to attempt upgrading the >>> remaining 2 MONs without understanding what happened. >>> >>> The cluster is on Jewel 10.2.5 (as was the dev cluster) >>> Both clusters running on CentOS 7.3 >>> >>> The only obvious difference I can see between the dev and production is >>> the production has selinux running in permissive mode, the dev had it >>> disabled. >>> >>> Any advice on how to proceed at this point would be much appreciated. The >>> cluster is currently functional, but I have 1 node out 4 with all OSDs down. >>> I had noout set before the upgrade and I've left it set for now. >>> >>> Here's the journalctl right after the packages were installed (hostname >>> changed): >>> >>> https://pastebin.com/fa6NMyjG >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com