Re: Adding a new monitor causes cluster freeze

"Daniel Nagy (Systec)" <daniel.nagy@xxxxxxxxxxx> · Mon, 30 Aug 2021 08:41:14 +0000

During deployment we checked https://docs.ceph.com/en/mimic/start/os-recommendations/ which recommends at least 4.x
kernel-ml repo already had 5.x that time, so we chose that instead.
________________________________
From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Sent: Monday, August 30, 2021 10:36
To: Daniel Nagy (Systec) <daniel.nagy@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re:  Adding a new monitor causes cluster freeze

Any reason to use kernel 5 rather than 3?

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

On 2021. Aug 30., at 10:26, Daniel Nagy (Systec) <daniel.nagy@xxxxxxxxxxx> wrote:

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Hi,

We have a mimic cluster (I know it is EOL, but cannot upgrade because of the following issue...) with 3 mons. One of them was rebooted and cannot join back. When it starts, the whole cluster is 'stuck', until I kill the joining mon process. Even a 'ceph -s' cannot be run during that period on the leader or peon.

Our environment:
CentOS7 with v5 kernel.
No local iptables, no selinux, every mons are in one DC
NTP provided using chrony.
Ceph cluster name: ceph03_vie
Mons:
ceph03-mon01 10.120.0.14 - leader
ceph03-mon03 10.120.0.16 - peon
ceph03-mon04 10.120.0.28 - the BAD

I tried cleaning up and manually adding ceph03-mon04 using this process:
rm -rf /var/lib/ceph/mon/ceph03_vie-ceph03-mon04/*
alias ceph='ceph --cluster=ceph03_vie'
ceph mon getmap -o /tmp/monmap
ceph auth get mon. -o /tmp/keyring
/usr/bin/ceph-mon -f --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --mkfs --monmap /tmp/monmap --keyring /tmp/keyring
/usr/bin/ceph-mon -f -d --cluster ceph03_vie --id ceph03-mon04 --setuser ceph --setgroup ceph --public-addr 10.120.0.28

After this, the whole thing gets stuck until I press ctrl-c.

-----

Ceph config:

# cat /etc/ceph/ceph03_vie.conf
[global]
fsid = 960f7aad-011a-467f-a046-0753002cd021
mon_initial_members = ceph03-mon01, ceph03-mon03
mon_host = 10.120.0.14,10.120.0.16
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

-----

Ceph status with the 2 good monitors:

[root@ceph03-mon03.vie2 dnagy]# ceph -s
 cluster:
   id:     960f7aad-011a-467f-a046-0753002cd021
   health: HEALTH_OK

 services:
   mon: 2 daemons, quorum ceph03-mon01,ceph03-mon03
   mgr: ceph03-mon03(active), standbys: ceph03-mon01
   mds: cephfs_vie-1/1/1 up  {0=ceph03-mon01=up:active}, 1 up:standby
   osd: 50 osds: 50 up, 50 in

 data:
   pools:   7 pools, 1192 pgs
   objects: 44.47 M objects, 52 TiB
   usage:   161 TiB used, 144 TiB / 304 TiB avail
   pgs:     1191 active+clean
            1    active+clean+scrubbing+deep

 io:
   client:   13 MiB/s rd, 40 MiB/s wr, 3.40 kop/s rd, 44 op/s wr

-----

Leader logs when joining the BAD:

2021-08-30 10:13:16.055 7fdbdf0ac700  0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
2021-08-30 10:13:16.055 7fdbdf0ac700  0 -- 10.120.0.14:6789/0 >> 10.120.0.28:6789/0 conn(0x5614e4cbc600 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
2021-08-30 10:13:16.059 7fdbdf0ac700  0 mon.ceph03-mon01@0(leader) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
2021-08-30 10:13:16.059 7fdbe28b3700  1 mon.ceph03-mon01@0(leader) e4  adding peer 10.120.0.28:6789/0 to list of hints

-----

Peon logs when joining the BAD:

2021-08-30 10:13:16.054 7f2293989700  0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
2021-08-30 10:13:16.055 7f2293989700  0 -- 10.120.0.16:6789/0 >> 10.120.0.28:6789/0 conn(0x556e00050000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
2021-08-30 10:13:16.055 7f2293989700  0 mon.ceph03-mon03@1(peon) e4 ms_verify_authorizer bad authorizer from mon 10.120.0.28:6789/0
2021-08-30 10:13:16.055 7f229698f700  1 mon.ceph03-mon03@1(peon) e4  adding peer 10.120.0.28:6789/0 to list of hints

-----

Logs of the joining BAD mon, after rocksdb initialization:

2021-08-30 10:13:16.040 7f0dc1114a00  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1630311196041191, "job": 1, "event": "recovery_finished"}
2021-08-30 10:13:16.044 7f0dc1114a00  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/13.2.10/rpm/el7/BUILD/ceph-13.2.10/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5597d5cf8000
2021-08-30 10:13:16.044 7f0dc1114a00  0 mon.ceph03-mon04 does not exist in monmap, will attempt to join an existing cluster
2021-08-30 10:13:16.044 7f0dc1114a00  0 using public_addr 10.120.0.28:0/0 -> 10.120.0.28:6789/0
2021-08-30 10:13:16.045 7f0dc1114a00  0 starting mon.ceph03-mon04 rank -1 at public addr 10.120.0.28:6789/0 at bind addr 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021
2021-08-30 10:13:16.045 7f0dc1114a00  0 starting mon.ceph03-mon04 rank -1 at 10.120.0.28:6789/0 mon_data /var/lib/ceph/mon/ceph03_vie-ceph03-mon04 fsid 960f7aad-011a-467f-a046-0753002cd021
2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4 preinit fsid 960f7aad-011a-467f-a046-0753002cd021
2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4  initial_members ceph03-mon01,ceph03-mon03, filtering seed monmap
2021-08-30 10:13:16.045 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing) e4 preinit clean up potentially inconsistent store state
2021-08-30 10:13:16.053 7f0dc1114a00  1 mon.ceph03-mon04@-1(probing).mds e0 Unable to load 'last_metadata'
2021-08-30 10:13:16.055 7f0daa723700  1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap
2021-08-30 10:13:16.055 7f0daa723700  1 mon.ceph03-mon04@-1(synchronizing) e4 sync_obtain_latest_monmap obtained monmap e4

IT STUCKS HERE. After pressing ctrl-c:

^C2021-08-30 10:13:26.661 7f0daf72d700 -1 received  signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0
2021-08-30 10:13:26.661 7f0daf72d700 -1 mon.ceph03-mon04@-1(synchronizing) e4 *** Got Signal Interrupt ***
2021-08-30 10:13:26.661 7f0daf72d700  1 mon.ceph03-mon04@-1(synchronizing) e4 shutdown

I cannot wait for minutes when joining the BAD because all client IO fail during the stuck election process. Moreover I'm afraid of touching any other mons.

Any help would be appreciated.

Thanks,

Daniel

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx