Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

Adam King <adking@xxxxxxxxxx> · Mon, 10 Apr 2023 15:53:58 -0400

Will also note that the normal upgrade process scales down the mds service
to have only 1 mds per fs before upgrading it, so maybe something you'd
want to do as well if the upgrade didn't do it already. It does so by
setting the max_mds to 1 for the fs.

On Mon, Apr 10, 2023 at 3:51 PM Adam King <adking@xxxxxxxxxx> wrote:

> You could try pausing the upgrade and manually "upgrading" the mds daemons
> by redeploying them on the new image. Something like "ceph orch daemon
> redeploy <mds-daemon-name> --image <17.2.6 image>" (daemon names should
> match those in "ceph orch ps" output). If you do that for all of them and
> then get them into an up state you should be able to resume the upgrade and
> have it complete.
>
> On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm <widhalmt@xxxxxxxxxxxxx>
> wrote:
>
>> Hi,
>>
>> If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I
>> was very relieved when 17.2.6 was released and started to update
>> immediately.
>>
>> But now I'm stuck again with my broken MDS. MDS won't get into up:active
>> without the update but the update waits for them to get into up:active
>> state. Seems like a deadlock / chicken-egg problem to me.
>>
>> Since I'm still relatively new to Ceph, could you help me?
>>
>> What I see when watching the update status:
>>
>> {
>>      "target_image":
>> "
>> quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635
>> ",
>>      "in_progress": true,
>>      "which": "Upgrading all daemon types on all hosts",
>>      "services_complete": [
>>          "crash",
>>          "mgr",
>>         "mon",
>>         "osd"
>>      ],
>>      "progress": "18/40 daemons upgraded",
>>      "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect
>> to host ceph01 at addr (192.168.23.61)",
>>      "is_paused": false
>> }
>>
>> (The offline host was one host that broke during the upgrade. I fixed
>> that in the meantime and the update went on.)
>>
>> And in the log:
>>
>> 2023-04-10T19:23:48.750129+0000 mgr.ceph04.qaexpv [INF] Upgrade: Waiting
>> for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
>> 2023-04-10T19:23:58.758141+0000 mgr.ceph04.qaexpv [WRN] Upgrade: No mds
>> is up; continuing upgrade procedure to poke things in the right direction
>>
>>
>> Please give me a hint what I can do.
>>
>> Cheers,
>> Thomas
>> --
>> http://www.widhalm.or.at
>> GnuPG : 6265BAE6 , A84CB603
>> Threema: H7AV7D33
>> Telegram, Signal: widhalmt@xxxxxxxxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx