Re: Storage down due to MON sync very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This sounds a lot like an old thread of mine:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/

See the discussion about mon_sync_max_payload_size, and the PR that
fixed this at some point in nautilus.

Our workaround was:

ceph config set mon mon_sync_max_payload_size 4096

Hope that helps,

Dan


On Wed, Jan 6, 2021 at 8:18 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Dear Dan,
>
> thanks for your fast response.
>
> Version: mimic 13.2.10.
>
> Here is the mon_status of the "new" MON during syncing:
>
> [root@ceph-01 ~]# ceph daemon mon.ceph-01 mon_status
> {
>     "name": "ceph-01",
>     "rank": 0,
>     "state": "synchronizing",
>     "election_epoch": 0,
>     "quorum": [],
>     "features": {
>         "required_con": "144115188346404864",
>         "required_mon": [
>             "kraken",
>             "luminous",
>             "mimic",
>             "osdmap-prune"
>         ],
>         "quorum_con": "0",
>         "quorum_mon": []
>     },
>     "outside_quorum": [
>         "ceph-01"
>     ],
>     "extra_probe_peers": [],
>     "sync_provider": [],
>     "sync": {
>         "sync_provider": "mon.2 192.168.32.67:6789/0",
>         "sync_cookie": 33302773774,
>         "sync_start_version": 38355711
>     },
>     "monmap": {
>         "epoch": 3,
>         "fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
>         "modified": "2019-03-14 23:08:34.717223",
>         "created": "2019-03-14 22:18:15.088212",
>         "features": {
>             "persistent": [
>                 "kraken",
>                 "luminous",
>                 "mimic",
>                 "osdmap-prune"
>             ],
>             "optional": []
>         },
>         "mons": [
>             {
>                 "rank": 0,
>                 "name": "ceph-01",
>                 "addr": "192.168.32.65:6789/0",
>                 "public_addr": "192.168.32.65:6789/0"
>             },
>             {
>                 "rank": 1,
>                 "name": "ceph-02",
>                 "addr": "192.168.32.66:6789/0",
>                 "public_addr": "192.168.32.66:6789/0"
>             },
>             {
>                 "rank": 2,
>                 "name": "ceph-03",
>                 "addr": "192.168.32.67:6789/0",
>                 "public_addr": "192.168.32.67:6789/0"
>             }
>         ]
>     },
>     "feature_map": {
>         "mon": [
>             {
>                 "features": "0x3ffddff8ffacfffb",
>                 "release": "luminous",
>                 "num": 1
>             }
>         ],
>         "mds": [
>             {
>                 "features": "0x3ffddff8ffacfffb",
>                 "release": "luminous",
>                 "num": 2
>             }
>         ],
>         "client": [
>             {
>                 "features": "0x2f018fb86aa42ada",
>                 "release": "luminous",
>                 "num": 1
>             },
>             {
>                 "features": "0x3ffddff8eeacfffb",
>                 "release": "luminous",
>                 "num": 1
>             },
>             {
>                 "features": "0x3ffddff8ffacfffb",
>                 "release": "luminous",
>                 "num": 17
>             }
>         ]
>     }
> }
>
> I'm a bit surprised that the other 2 MONs don't remain in quorum until this MON has caught up. Is there any way to monitor the syncing progress? Right now I need to interrupt regularly to allow some I/O, but I have no clue how long I need to wait.
>
> Thanks for your help!
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
> Sent: 06 January 2021 20:16:44
> To: Frank Schilder
> Cc: Ceph Users
> Subject: Re:  Re: Storage down due to MON sync very slow
>
> Which version of Ceph are you running?
>
> .. dan
>
>
> On Wed, Jan 6, 2021, 8:14 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
> In the output of the MON I see slow ops warnings:
>
> debug 2021-01-06 20:12:48.854 7f1a3d29f700 -1 mon.ceph-01@0(synchronizing) e3 get_health_metrics reporting 20 slow ops, oldest is log(1 entries from seq 1 at 2021-01-06 20:00:12.014861)
>
> There appears to be no progress on this operation, it is stuck.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>>
> Sent: 06 January 2021 20:11:25
> To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> Subject:  Storage down due to MON sync very slow
>
> Dear all,
>
> I had to restart one out of 3 MONs on an empty MON DB dir. It is in state syncing right now, but I'm not sure if there is any progress. The cluster is completely unresponsive even though I have 2 healthy MONs. Is there any way to sync the DB directory faster and/or without downtime?
>
> Thanks a lot!
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux