Re: Jewel -> Luminous upgrade, package install stopped all daemons

Vasu Kulkarni <vakulkar@xxxxxxxxxx> · Fri, 15 Sep 2017 14:03:41 -0700



On Fri, Sep 15, 2017 at 1:48 PM, David <dclistslinux@xxxxxxxxx> wrote:
> Happy to report I got everything up to Luminous, used your tip to keep the
> OSDs running, David, thanks again for that.
>
> I'd say this is a potential gotcha for people collocating MONs. It appears
> that if you're running selinux, even in permissive mode, upgrading the
> ceph-selinux packages forces a restart on all the OSDs.
It is the ceph-osd and/or ceph-mon that got upgraded and has restarted
the service.
This is the *correct* behavior, If it is not restarting after upgrade
then its a bug.

You're left with a
> load of OSDs down that you can't start as you don't have a Luminous mon
> quorum yet.
Do you have only one monitor? It is recommended to have more 3 or odd
number of mons(>3) for HA, The release
notes also mentions upgrading mon's one by one, As long as you have
redundancy mon/osd collocation shouldn't matter much.

>
>
> On 15 Sep 2017 4:54 p.m., "David" <dclistslinux@xxxxxxxxx> wrote:
>
> Hi David
>
> I like your thinking! Thanks for the suggestion. I've got a maintenance
> window later to finish the update so will give it a try.
>
>
> On Thu, Sep 14, 2017 at 6:24 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
>>
>> This isn't a great solution, but something you could try.  If you stop all
>> of the daemons via systemd and start them all in a screen as a manually
>> running daemon in the foreground of each screen... I don't think that yum
>> updating the packages can stop or start the daemons.  You could copy and
>> paste the running command (viewable in ps) to know exactly what to run in
>> the screens to start the daemons like this.
>>
>> On Wed, Sep 13, 2017 at 6:53 PM David <dclistslinux@xxxxxxxxx> wrote:
>>>
>>> Hi All
>>>
>>> I did a Jewel -> Luminous upgrade on my dev cluster and it went very
>>> smoothly.
>>>
>>> I've attempted to upgrade on a small production cluster but I've hit a
>>> snag.
>>>
>>> After installing the ceph 12.2.0 packages with "yum install ceph" on the
>>> first node and accepting all the dependencies, I found that all the OSD
>>> daemons, the MON and the MDS running on that node were terminated. Systemd
>>> appears to have attempted to restart them all but the daemons didn't start
>>> successfully (not surprising as first stage of upgrading all mons in cluster
>>> not completed). I was able to start the MON and it's running. The OSDs are
>>> all down and I'm reluctant to attempt to start them without upgrading the
>>> other MONs in the cluster. I'm also reluctant to attempt upgrading the
>>> remaining 2 MONs without understanding what happened.
>>>
>>> The cluster is on Jewel 10.2.5 (as was the dev cluster)
>>> Both clusters running on CentOS 7.3
>>>
>>> The only obvious difference I can see between the dev and production is
>>> the production has selinux running in permissive mode, the dev had it
>>> disabled.
>>>
>>> Any advice on how to proceed at this point would be much appreciated. The
>>> cluster is currently functional, but I have 1 node out 4 with all OSDs down.
>>> I had noout set before the upgrade and I've left it set for now.
>>>
>>> Here's the journalctl right after the packages were installed (hostname
>>> changed):
>>>
>>> https://pastebin.com/fa6NMyjG
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com