Re: Adding a new monitor causes cluster freeze

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 30 Aug 2021 10:48:14 +0200

This sounds a lot like:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/

See the discussion about mon_sync_max_payload_size, and the PR that
fixed this at some point in nautilus. (https://github.com/ceph/ceph/pull/31581)
It probably was never fixed in mimic.

Our workaround was:

ceph config set mon mon_sync_max_payload_size 4096

Hope that helps,

Dan

On Mon, Aug 30, 2021 at 10:24 AM Daniel Nagy (Systec)
<daniel.nagy@xxxxxxxxxxx> wrote:
>
> Hi,
>
> We have a mimic cluster (I know it is EOL, but cannot upgrade because of the following issue...) with 3 mons. One of them was rebooted and cannot join back. When it starts, the whole cluster is 'stuck', until I kill the joining mon process. Even a 'ceph -s' cannot be run during that period on the leader or peon.
>
> Our environment:
> CentOS7 with v5 kernel.
> No local iptables, no selinux, every mons are in one DC
> NTP provided using chrony.
> Ceph cluster name: ceph03_vie
> Mons:
> ceph03-mon01 10.120.0.14 - leader
> ceph03-mon03 10.120.0.16 - peon
> ceph03-mon04 10.120.0.28 - the BAD
>
> I tried cleaning up and manually adding ceph03-mon04 using this process:
> rm -rf /var/lib/ceph/mon/ceph03_vie-ceph03-mon04/*
> alias ceph='ceph --cluster=ceph03_vie'
> ceph mon getmap -o /tmp/monmap
> ceph auth get mon. -o /tmp/keyring
> /usr/bin/ceph-mon -f --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --mkfs --monmap /tmp/monmap --keyring /tmp/keyring
> /usr/bin/ceph-mon -f -d --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --public-addr 10.120.0.28
>
> After this, the whole thing gets stuck until I press ctrl-c.
>
> -----
>
> Ceph config:
>
> # cat /etc/ceph/ceph03_vie.conf
> [global]
> fsid = 960f7aad-011a-467f-a046-0753002cd021
> mon_initial_members = ceph03-mon01, ceph03-mon03
> mon_host = 10.120.0.14,10.120.0.16
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> -----
>
> Ceph status with the 2 good monitors:
>
> [root@ceph03-mon03.vie2 dnagy]# ceph -s
>   cluster:
>     id:     960f7aad-011a-467f-a046-0753002cd021
>     health: HEALTH_OK
>
>   services:
>     mon: 2 daemons, quorum ceph03-mon01,ceph03-mon03
>     mgr: ceph03-mon03(active), standbys: ceph03-mon01
>     mds: cephfs_vie-1/1/1 up  {0=ceph03-mon01=up:active}, 1 up:standby
>     osd: 50 osds: 50 up, 50 in
>
>   data:
>     pools:   7 pools, 1192 pgs
>     objects: 44.47 M objects, 52 TiB
>     usage:   161 TiB used, 144 TiB / 304 TiB avail
>     pgs:     1191 active+clean
>              1    active+clean+scrubbing+deep
>
>   io:
>     client:   13 MiB/s rd, 40 MiB/s wr, 3.40 kop/s rd, 44 op/s wr
>
> -----
>
> Leader logs when joining the BAD:
>
> 2021-08-30 10:13:16.055 7fdbdf0ac700  0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
> 2021-08-30 10:13:16.055 7fdbdf0ac700  0 -- 10.120.0.14:6789/0 >> 10.120.0.28:6789/0 conn(0x5614e4cbc600 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
> 2021-08-30 10:13:16.059 7fdbdf0ac700  0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
> 2021-08-30 10:13:16.059 7fdbe28b3700  1 mon.ceph03-mon01@0(leader) e4  adding peer 10.120.0.28:6789/0 to list of hints
>
> -----
>
> Peon logs when joining the BAD:
>
> 2021-08-30 10:13:16.054 7f2293989700  0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
> 2021-08-30 10:13:16.055 7f2293989700  0 -- 10.120.0.16:6789/0 >> 10.120.0.28:6789/0 conn(0x556e00050000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
> 2021-08-30 10:13:16.055 7f2293989700  0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
> 2021-08-30 10:13:16.055 7f229698f700  1 mon.ceph03-mon03@1(peon) e4  adding peer 10.120.0.28:6789/0 to list of hints
>
> -----
>
> Logs of the joining BAD mon, after rocksdb initialization:
>
> 2021-08-30 10:13:16.040 7f0dc1114a00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1630311196041191, "job": 1, "event": "recovery_finished"}
> 2021-08-30 10:13:16.044 7f0dc1114a00  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/13.2.10/rpm/el7/BUILD/ceph-13.2.10/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5597d5cf8000
> 2021-08-30 10:13:16.044 7f0dc1114a00  0 mon.ceph03-mon04 does not exist in monmap, will attempt to join an existing cluster
> 2021-08-30 10:13:16.044 7f0dc1114a00  0 using public_addr 10.120.0.28:0/0 -> 10.120.0.28:6789/0
> 2021-08-30 10:13:16.045 7f0dc1114a00  0 starting mon.ceph03-mon04 rank -1 at public addr 10.120.0.28:6789/0 at bind addr 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021
> 2021-08-30 10:13:16.045 7f0dc1114a00  0 starting mon.ceph03-mon04 rank -1 at 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021
> 2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4 preinit fsid 960f7aad-011a-467f-a046-0753002cd021
> 2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4  initial_members ceph03-mon01,ceph03-mon03, filtering seed monmap
> 2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4 preinit clean up potentially inconsistent store state
> 2021-08-30 10:13:16.053 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing).mds e0 Unable to load 'last_metadata'
> 2021-08-30 10:13:16.055 7f0daa723700  1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap
> 2021-08-30 10:13:16.055 7f0daa723700  1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap obtained monmap e4
>
> IT STUCKS HERE. After pressing ctrl-c:
>
> ^C2021-08-30 10:13:26.661 7f0daf72d700 -1 received  signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0
> 2021-08-30 10:13:26.661 7f0daf72d700 -1 mon.ceph03-mon04@-1(synchronizing) e4 *** Got Signal Interrupt ***
> 2021-08-30 10:13:26.661 7f0daf72d700  1 mon.ceph03-mon04@-1(synchronizing) e4 shutdown
>
>
> I cannot wait for minutes when joining the BAD because all client IO fail during the stuck election process. Moreover I'm afraid of touching any other mons.
>
> Any help would be appreciated.
>
>
> Thanks,
>
> Daniel
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx