Re: Storage down due to MON sync very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Dan,

thanks for your fast response.

Version: mimic 13.2.10.

Here is the mon_status of the "new" MON during syncing:

[root@ceph-01 ~]# ceph daemon mon.ceph-01 mon_status
{
    "name": "ceph-01",
    "rank": 0,
    "state": "synchronizing",
    "election_epoch": 0,
    "quorum": [],
    "features": {
        "required_con": "144115188346404864",
        "required_mon": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune"
        ],
        "quorum_con": "0",
        "quorum_mon": []
    },
    "outside_quorum": [
        "ceph-01"
    ],
    "extra_probe_peers": [],
    "sync_provider": [],
    "sync": {
        "sync_provider": "mon.2 192.168.32.67:6789/0",
        "sync_cookie": 33302773774,
        "sync_start_version": 38355711
    },
    "monmap": {
        "epoch": 3,
        "fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
        "modified": "2019-03-14 23:08:34.717223",
        "created": "2019-03-14 22:18:15.088212",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "ceph-01",
                "addr": "192.168.32.65:6789/0",
                "public_addr": "192.168.32.65:6789/0"
            },
            {
                "rank": 1,
                "name": "ceph-02",
                "addr": "192.168.32.66:6789/0",
                "public_addr": "192.168.32.66:6789/0"
            },
            {
                "rank": 2,
                "name": "ceph-03",
                "addr": "192.168.32.67:6789/0",
                "public_addr": "192.168.32.67:6789/0"
            }
        ]
    },
    "feature_map": {
        "mon": [
            {
                "features": "0x3ffddff8ffacfffb",
                "release": "luminous",
                "num": 1
            }
        ],
        "mds": [
            {
                "features": "0x3ffddff8ffacfffb",
                "release": "luminous",
                "num": 2
            }
        ],
        "client": [
            {
                "features": "0x2f018fb86aa42ada",
                "release": "luminous",
                "num": 1
            },
            {
                "features": "0x3ffddff8eeacfffb",
                "release": "luminous",
                "num": 1
            },
            {
                "features": "0x3ffddff8ffacfffb",
                "release": "luminous",
                "num": 17
            }
        ]
    }
}

I'm a bit surprised that the other 2 MONs don't remain in quorum until this MON has caught up. Is there any way to monitor the syncing progress? Right now I need to interrupt regularly to allow some I/O, but I have no clue how long I need to wait.

Thanks for your help!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
Sent: 06 January 2021 20:16:44
To: Frank Schilder
Cc: Ceph Users
Subject: Re:  Re: Storage down due to MON sync very slow

Which version of Ceph are you running?

.. dan


On Wed, Jan 6, 2021, 8:14 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
In the output of the MON I see slow ops warnings:

debug 2021-01-06 20:12:48.854 7f1a3d29f700 -1 mon.ceph-01@0(synchronizing) e3 get_health_metrics reporting 20 slow ops, oldest is log(1 entries from seq 1 at 2021-01-06 20:00:12.014861)

There appears to be no progress on this operation, it is stuck.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>>
Sent: 06 January 2021 20:11:25
To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject:  Storage down due to MON sync very slow

Dear all,

I had to restart one out of 3 MONs on an empty MON DB dir. It is in state syncing right now, but I'm not sure if there is any progress. The cluster is completely unresponsive even though I have 2 healthy MONs. Is there any way to sync the DB directory faster and/or without downtime?

Thanks a lot!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux