Hi Dan, it is possible that the payload reduction also solved or at least reduced a really bad problem that looks related (beware, that's a long one): https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/FBGIJZNFG445NMYGO73PFNQL2ZB3ZF2Z/#FBGIJZNFG445NMYGO73PFNQL2ZB3ZF2Z . Since reducing the payload size I still observe these large peaks in the MON network activity. However, it seems that the cluster does not go down like before any more. During these peaks, I see warnings like these: 2021-01-22 12:00:00.000102 [WRN] overall HEALTH_WARN 1 pools nearfull 2021-01-22 11:04:09.156796 [INF] Health check cleared: SLOW_OPS (was: 5 slow ops, oldest one blocked for 75 sec, mon.ceph-02 has slow ops) 2021-01-22 11:04:07.994416 [WRN] Health check update: 5 slow ops, oldest one blocked for 75 sec, mon.ceph-02 has slow ops (SLOW_OPS) 2021-01-22 11:04:01.469498 [WRN] Health check failed: 124 slow ops, oldest one blocked for 82 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops. (SLOW_OPS) 2021-01-22 11:00:00.000104 [WRN] overall HEALTH_WARN 1 pools nearfull 2021-01-22 10:36:44.576663 [INF] Health check cleared: SLOW_OPS (was: 25 slow ops, oldest one blocked for 42 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops.) 2021-01-22 10:36:38.543763 [WRN] Health check failed: 18 slow ops, oldest one blocked for 38 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops. (SLOW_OPS) So, at least stuff is working. I now lean towards the hypothesis that these outages were caused by some synchronisation process between MONs that got less problematic with reducing the payload size. I might be able to reduce my insane beacon time-outs again, but before doing so, do you know of any other communication parameters similar to the mon_sync_max_payload_size that might be relevant in MON-[MON, MGR, OSD] communication? In general, I have the impression that due to such little bugs the recommendation for production clusters should be elevated to at least 5 MONs so that one can afford 2 MONs going out of quorum temporarily. I will upgrade our cluster to 5 MONs as soon as I can. Thanks for your help and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Dan van der Ster <dan@xxxxxxxxxxxxxx> Sent: 06 January 2021 20:53:14 To: Frank Schilder Subject: Re: Re: Storage down due to MON sync very slow Yeah I was going to say -- ignore all of the rsync advice in that thread, it is unnecessary. Setting a small mon sync payload works like magic :) -- dan On Wed, Jan 6, 2021 at 8:49 PM Frank Schilder <frans@xxxxxx> wrote: > > OK, sorry for all my questions. > > Setting mon_sync_max_payload_size=4096 actually makes the MON sync in no time! Thank you so much :) > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Frank Schilder > Sent: 06 January 2021 20:40:26 > To: Dan van der Ster > Subject: Re: Re: Storage down due to MON sync very slow > > OK, thanks a lot! I will try it now. Hope the cluster remains responsive. > > I'm wondering about this approach someone brought up in your thread: > > Eventually I stopped one MON, tarballed it's database and used that to > bring back the MON which was upgraded to 13.2.8 > > That work without any hickups. The MON joined again within a few seconds. > > Stopping one MON for a copy would be much shorter storage outage than the sync I'm doing. I guess its the entire mon data directory to copy. I always wondered if this contains data tied to a specific MON. If not, the copy approach could speed things up a lot. What do you think? > > Thanks again and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > Sent: 06 January 2021 20:36:15 > To: Frank Schilder > Subject: Re: Re: Storage down due to MON sync very slow > > We have used mon_sync_max_payload_size 4096 on our largest most > important prod cluster since that thread. > The PR from Sage makes something like that the default anyway. (the PR > counts keys rather than bytes, but the effect is the same). > > mon_sync_max_payload_size 4096 should not impact the speed of syncing > -- it simply breaks the sync into smaller more manageable pieces. > (Without this, if you have lots of keys in the mon db, in our case > caused by lots of rbd snapshots, then syncing will never ever > complete). > > -- dan > > On Wed, Jan 6, 2021 at 8:32 PM Frank Schilder <frans@xxxxxx> wrote: > > > > Hi Dan, > > > > thanks for that. Will it slow down or accelerate the syncing (will read your post after that e-mail), or will it just allow I/O to continue and sync more in the background? Current value is > > > > mon_sync_max_payload_size 1048576 > > > > Related to that, would building a MON store from OSDs following https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds help providing a head start? Not sure if this procedure works on an active cluster. > > > > Will study your thread now ... > > > > Thanks again and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > > Sent: 06 January 2021 20:26:46 > > To: Frank Schilder > > Subject: Re: Re: Storage down due to MON sync very slow > > > > (obviously just put that config in the ceph.conf on the mons if mimic > > doesn't have ceph config... I don't quite remember.) > > > > -- dan > > > > On Wed, Jan 6, 2021 at 8:25 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > > > This sounds a lot like an old thread of mine: > > > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/ > > > > > > See the discussion about mon_sync_max_payload_size, and the PR that > > > fixed this at some point in nautilus. > > > > > > Our workaround was: > > > > > > ceph config set mon mon_sync_max_payload_size 4096 > > > > > > Hope that helps, > > > > > > Dan > > > > > > > > > On Wed, Jan 6, 2021 at 8:18 PM Frank Schilder <frans@xxxxxx> wrote: > > > > > > > > Dear Dan, > > > > > > > > thanks for your fast response. > > > > > > > > Version: mimic 13.2.10. > > > > > > > > Here is the mon_status of the "new" MON during syncing: > > > > > > > > [root@ceph-01 ~]# ceph daemon mon.ceph-01 mon_status > > > > { > > > > "name": "ceph-01", > > > > "rank": 0, > > > > "state": "synchronizing", > > > > "election_epoch": 0, > > > > "quorum": [], > > > > "features": { > > > > "required_con": "144115188346404864", > > > > "required_mon": [ > > > > "kraken", > > > > "luminous", > > > > "mimic", > > > > "osdmap-prune" > > > > ], > > > > "quorum_con": "0", > > > > "quorum_mon": [] > > > > }, > > > > "outside_quorum": [ > > > > "ceph-01" > > > > ], > > > > "extra_probe_peers": [], > > > > "sync_provider": [], > > > > "sync": { > > > > "sync_provider": "mon.2 192.168.32.67:6789/0", > > > > "sync_cookie": 33302773774, > > > > "sync_start_version": 38355711 > > > > }, > > > > "monmap": { > > > > "epoch": 3, > > > > "fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9", > > > > "modified": "2019-03-14 23:08:34.717223", > > > > "created": "2019-03-14 22:18:15.088212", > > > > "features": { > > > > "persistent": [ > > > > "kraken", > > > > "luminous", > > > > "mimic", > > > > "osdmap-prune" > > > > ], > > > > "optional": [] > > > > }, > > > > "mons": [ > > > > { > > > > "rank": 0, > > > > "name": "ceph-01", > > > > "addr": "192.168.32.65:6789/0", > > > > "public_addr": "192.168.32.65:6789/0" > > > > }, > > > > { > > > > "rank": 1, > > > > "name": "ceph-02", > > > > "addr": "192.168.32.66:6789/0", > > > > "public_addr": "192.168.32.66:6789/0" > > > > }, > > > > { > > > > "rank": 2, > > > > "name": "ceph-03", > > > > "addr": "192.168.32.67:6789/0", > > > > "public_addr": "192.168.32.67:6789/0" > > > > } > > > > ] > > > > }, > > > > "feature_map": { > > > > "mon": [ > > > > { > > > > "features": "0x3ffddff8ffacfffb", > > > > "release": "luminous", > > > > "num": 1 > > > > } > > > > ], > > > > "mds": [ > > > > { > > > > "features": "0x3ffddff8ffacfffb", > > > > "release": "luminous", > > > > "num": 2 > > > > } > > > > ], > > > > "client": [ > > > > { > > > > "features": "0x2f018fb86aa42ada", > > > > "release": "luminous", > > > > "num": 1 > > > > }, > > > > { > > > > "features": "0x3ffddff8eeacfffb", > > > > "release": "luminous", > > > > "num": 1 > > > > }, > > > > { > > > > "features": "0x3ffddff8ffacfffb", > > > > "release": "luminous", > > > > "num": 17 > > > > } > > > > ] > > > > } > > > > } > > > > > > > > I'm a bit surprised that the other 2 MONs don't remain in quorum until this MON has caught up. Is there any way to monitor the syncing progress? Right now I need to interrupt regularly to allow some I/O, but I have no clue how long I need to wait. > > > > > > > > Thanks for your help! > > > > > > > > Best regards, > > > > ================= > > > > Frank Schilder > > > > AIT Risø Campus > > > > Bygning 109, rum S14 > > > > > > > > ________________________________________ > > > > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > > > > Sent: 06 January 2021 20:16:44 > > > > To: Frank Schilder > > > > Cc: Ceph Users > > > > Subject: Re: Re: Storage down due to MON sync very slow > > > > > > > > Which version of Ceph are you running? > > > > > > > > .. dan > > > > > > > > > > > > On Wed, Jan 6, 2021, 8:14 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote: > > > > In the output of the MON I see slow ops warnings: > > > > > > > > debug 2021-01-06 20:12:48.854 7f1a3d29f700 -1 mon.ceph-01@0(synchronizing) e3 get_health_metrics reporting 20 slow ops, oldest is log(1 entries from seq 1 at 2021-01-06 20:00:12.014861) > > > > > > > > There appears to be no progress on this operation, it is stuck. > > > > > > > > Best regards, > > > > ================= > > > > Frank Schilder > > > > AIT Risø Campus > > > > Bygning 109, rum S14 > > > > > > > > ________________________________________ > > > > From: Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> > > > > Sent: 06 January 2021 20:11:25 > > > > To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > > > > Subject: Storage down due to MON sync very slow > > > > > > > > Dear all, > > > > > > > > I had to restart one out of 3 MONs on an empty MON DB dir. It is in state syncing right now, but I'm not sure if there is any progress. The cluster is completely unresponsive even though I have 2 healthy MONs. Is there any way to sync the DB directory faster and/or without downtime? > > > > > > > > Thanks a lot! > > > > > > > > Best regards, > > > > ================= > > > > Frank Schilder > > > > AIT Risø Campus > > > > Bygning 109, rum S14 > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx