Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did what you told me.

I also see in the log, that the command went through:

2023-04-10T19:58:46.522477+0000 mgr.ceph04.qaexpv [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks 2023-04-10T20:01:03.360559+0000 mgr.ceph04.qaexpv [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt 2023-04-10T20:01:21.787635+0000 mgr.ceph04.qaexpv [INF] Schedule redeploy daemon mds.mds01.ceph07.omdisd


But the MDS never start. They stay in error state. I tried to redeploy and start them a few times. Even restarted one host where a MDS should run.

mds.mds01.ceph03.xqwdjy ceph03 error 32m ago 2M - - <unknown> <unknown> <unknown> mds.mds01.ceph04.hcmvae ceph04 error 31m ago 2h - - <unknown> <unknown> <unknown> mds.mds01.ceph05.pqxmvt ceph05 error 32m ago 9M - - <unknown> <unknown> <unknown> mds.mds01.ceph06.rrxmks ceph06 error 32m ago 10w - - <unknown> <unknown> <unknown> mds.mds01.ceph07.omdisd ceph07 error 32m ago 2M - - <unknown> <unknown> <unknown>


And other ideas? Or am I missing something.

Cheers,
Thomas

On 10.04.23 21:53, Adam King wrote:
Will also note that the normal upgrade process scales down the mds service to have only 1 mds per fs before upgrading it, so maybe something you'd want to do as well if the upgrade didn't do it already. It does so by setting the max_mds to 1 for the fs.

On Mon, Apr 10, 2023 at 3:51 PM Adam King <adking@xxxxxxxxxx <mailto:adking@xxxxxxxxxx>> wrote:

    You could try pausing the upgrade and manually "upgrading" the mds
    daemons by redeploying them on the new image. Something like "ceph
    orch daemon redeploy <mds-daemon-name> --image <17.2.6 image>"
    (daemon names should match those in "ceph orch ps" output). If you
    do that for all of them and then get them into an up state you
    should be able to resume the upgrade and have it complete.

    On Mon, Apr 10, 2023 at 3:25 PM Thomas Widhalm
    <widhalmt@xxxxxxxxxxxxx <mailto:widhalmt@xxxxxxxxxxxxx>> wrote:

        Hi,

        If you remember, I hit bug https://tracker.ceph.com/issues/58489
        <https://tracker.ceph.com/issues/58489> so I
        was very relieved when 17.2.6 was released and started to update
        immediately.

        But now I'm stuck again with my broken MDS. MDS won't get into
        up:active
        without the update but the update waits for them to get into
        up:active
        state. Seems like a deadlock / chicken-egg problem to me.

        Since I'm still relatively new to Ceph, could you help me?

        What I see when watching the update status:

        {
              "target_image":
        "quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635 <http://quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635>",
              "in_progress": true,
              "which": "Upgrading all daemon types on all hosts",
              "services_complete": [
                  "crash",
                  "mgr",
                 "mon",
                 "osd"
              ],
              "progress": "18/40 daemons upgraded",
              "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed
        to connect
        to host ceph01 at addr (192.168.23.61)",
              "is_paused": false
        }

        (The offline host was one host that broke during the upgrade. I
        fixed
        that in the meantime and the update went on.)

        And in the log:

        2023-04-10T19:23:48.750129+0000 mgr.ceph04.qaexpv [INF] Upgrade:
        Waiting
        for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay)
        2023-04-10T19:23:58.758141+0000 mgr.ceph04.qaexpv [WRN] Upgrade:
        No mds
        is up; continuing upgrade procedure to poke things in the right
        direction


        Please give me a hint what I can do.

        Cheers,
        Thomas
-- http://www.widhalm.or.at <http://www.widhalm.or.at>
        GnuPG : 6265BAE6 , A84CB603
        Threema: H7AV7D33
        Telegram, Signal: widhalmt@xxxxxxxxxxxxx
        <mailto:widhalmt@xxxxxxxxxxxxx>
        _______________________________________________
        ceph-users mailing list -- ceph-users@xxxxxxx
        <mailto:ceph-users@xxxxxxx>
        To unsubscribe send an email to ceph-users-leave@xxxxxxx
        <mailto:ceph-users-leave@xxxxxxx>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux