Dan, you saved the day, that tunable helped, now we have 3 mons again. Thanks You! We will definitely upgrade to nautilus at least. Thanks again! ________________________________ From: Dan van der Ster <dan@xxxxxxxxxxxxxx> Sent: Monday, August 30, 2021 10:48 To: Daniel Nagy (Systec) <daniel.nagy@xxxxxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: Adding a new monitor causes cluster freeze This sounds a lot like: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ceph.io%2Fhyperkitty%2Flist%2Fceph-users%40ceph.io%2Fthread%2FM5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH%2F&data=04%7C01%7Cdaniel.nagy%40emarsys.com%7Cfb842a42896a41beb4ad08d96b92ff78%7Ce7ce8fb50cee4d798970955cffd362cf%7C0%7C0%7C637659101364039921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jZLapBbZtXWInEPoGXnzeeRAtI88OeZkYNXsd2MujCI%3D&reserved=0 See the discussion about mon_sync_max_payload_size, and the PR that fixed this at some point in nautilus. (https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fceph%2Fceph%2Fpull%2F31581&data=04%7C01%7Cdaniel.nagy%40emarsys.com%7Cfb842a42896a41beb4ad08d96b92ff78%7Ce7ce8fb50cee4d798970955cffd362cf%7C0%7C0%7C637659101364039921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=V1aarF5b1paS8Xar8RApwm1WgtGQuJgn6GufdE0ivWA%3D&reserved=0) It probably was never fixed in mimic. Our workaround was: ceph config set mon mon_sync_max_payload_size 4096 Hope that helps, Dan On Mon, Aug 30, 2021 at 10:24 AM Daniel Nagy (Systec) <daniel.nagy@xxxxxxxxxxx> wrote: > > Hi, > > We have a mimic cluster (I know it is EOL, but cannot upgrade because of the following issue...) with 3 mons. One of them was rebooted and cannot join back. When it starts, the whole cluster is 'stuck', until I kill the joining mon process. Even a 'ceph -s' cannot be run during that period on the leader or peon. > > Our environment: > CentOS7 with v5 kernel. > No local iptables, no selinux, every mons are in one DC > NTP provided using chrony. > Ceph cluster name: ceph03_vie > Mons: > ceph03-mon01 10.120.0.14 - leader > ceph03-mon03 10.120.0.16 - peon > ceph03-mon04 10.120.0.28 - the BAD > > I tried cleaning up and manually adding ceph03-mon04 using this process: > rm -rf /var/lib/ceph/mon/ceph03_vie-ceph03-mon04/* > alias ceph='ceph --cluster=ceph03_vie' > ceph mon getmap -o /tmp/monmap > ceph auth get mon. -o /tmp/keyring > /usr/bin/ceph-mon -f --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --mkfs --monmap /tmp/monmap --keyring /tmp/keyring > /usr/bin/ceph-mon -f -d --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --public-addr 10.120.0.28 > > After this, the whole thing gets stuck until I press ctrl-c. > > ----- > > Ceph config: > > # cat /etc/ceph/ceph03_vie.conf > [global] > fsid = 960f7aad-011a-467f-a046-0753002cd021 > mon_initial_members = ceph03-mon01, ceph03-mon03 > mon_host = 10.120.0.14,10.120.0.16 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > > ----- > > Ceph status with the 2 good monitors: > > [root@ceph03-mon03.vie2 dnagy]# ceph -s > cluster: > id: 960f7aad-011a-467f-a046-0753002cd021 > health: HEALTH_OK > > services: > mon: 2 daemons, quorum ceph03-mon01,ceph03-mon03 > mgr: ceph03-mon03(active), standbys: ceph03-mon01 > mds: cephfs_vie-1/1/1 up {0=ceph03-mon01=up:active}, 1 up:standby > osd: 50 osds: 50 up, 50 in > > data: > pools: 7 pools, 1192 pgs > objects: 44.47 M objects, 52 TiB > usage: 161 TiB used, 144 TiB / 304 TiB avail > pgs: 1191 active+clean > 1 active+clean+scrubbing+deep > > io: > client: 13 MiB/s rd, 40 MiB/s wr, 3.40 kop/s rd, 44 op/s wr > > ----- > > Leader logs when joining the BAD: > > 2021-08-30 10:13:16.055 7fdbdf0ac700 0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0 > 2021-08-30 10:13:16.055 7fdbdf0ac700 0 -- 10.120.0.14:6789/0 >> 10.120.0.28:6789/0 conn(0x5614e4cbc600 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing > 2021-08-30 10:13:16.059 7fdbdf0ac700 0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0 > 2021-08-30 10:13:16.059 7fdbe28b3700 1 mon.ceph03-mon01@0(leader) e4 adding peer 10.120.0.28:6789/0 to list of hints > > ----- > > Peon logs when joining the BAD: > > 2021-08-30 10:13:16.054 7f2293989700 0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0 > 2021-08-30 10:13:16.055 7f2293989700 0 -- 10.120.0.16:6789/0 >> 10.120.0.28:6789/0 conn(0x556e00050000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing > 2021-08-30 10:13:16.055 7f2293989700 0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0 > 2021-08-30 10:13:16.055 7f229698f700 1 mon.ceph03-mon03@1(peon) e4 adding peer 10.120.0.28:6789/0 to list of hints > > ----- > > Logs of the joining BAD mon, after rocksdb initialization: > > 2021-08-30 10:13:16.040 7f0dc1114a00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1630311196041191, "job": 1, "event": "recovery_finished"} > 2021-08-30 10:13:16.044 7f0dc1114a00 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/13.2.10/rpm/el7/BUILD/ceph-13.2.10/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5597d5cf8000 > 2021-08-30 10:13:16.044 7f0dc1114a00 0 mon.ceph03-mon04 does not exist in monmap, will attempt to join an existing cluster > 2021-08-30 10:13:16.044 7f0dc1114a00 0 using public_addr 10.120.0.28:0/0 -> 10.120.0.28:6789/0 > 2021-08-30 10:13:16.045 7f0dc1114a00 0 starting mon.ceph03-mon04 rank -1 at public addr 10.120.0.28:6789/0 at bind addr 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021 > 2021-08-30 10:13:16.045 7f0dc1114a00 0 starting mon.ceph03-mon04 rank -1 at 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021 > 2021-08-30 10:13:16.045 7f0dc1114a00 1 mon.ceph03-mon04@-1(probing) e4 preinit fsid 960f7aad-011a-467f-a046-0753002cd021 > 2021-08-30 10:13:16.045 7f0dc1114a00 1 mon.ceph03-mon04@-1(probing) e4 initial_members ceph03-mon01,ceph03-mon03, filtering seed monmap > 2021-08-30 10:13:16.045 7f0dc1114a00 1 mon.ceph03-mon04@-1(probing) e4 preinit clean up potentially inconsistent store state > 2021-08-30 10:13:16.053 7f0dc1114a00 1 mon.ceph03-mon04@-1(probing).mds e0 Unable to load 'last_metadata' > 2021-08-30 10:13:16.055 7f0daa723700 1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap > 2021-08-30 10:13:16.055 7f0daa723700 1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap obtained monmap e4 > > IT STUCKS HERE. After pressing ctrl-c: > > ^C2021-08-30 10:13:26.661 7f0daf72d700 -1 received signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0 > 2021-08-30 10:13:26.661 7f0daf72d700 -1 mon.ceph03-mon04@-1(synchronizing) e4 *** Got Signal Interrupt *** > 2021-08-30 10:13:26.661 7f0daf72d700 1 mon.ceph03-mon04@-1(synchronizing) e4 shutdown > > > I cannot wait for minutes when joining the BAD because all client IO fail during the stuck election process. Moreover I'm afraid of touching any other mons. > > Any help would be appreciated. > > > Thanks, > > Daniel > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx