Hello cephers,
we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need
to get it up again.
We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby)
It started this night, I got the first HEALTH_WARN emails saying:
HEALTH_WARN
--- New ---
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.default.cephmon-02.duujba(mds.1): Client
apollo-10:cephfs_user failing to respond to cache pressure client_id:
1962074
=== Full health status ===
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.default.cephmon-02.duujba(mds.1): Client
apollo-10:cephfs_user failing to respond to cache pressure client_id:
1962074
then it went on with:
HEALTH_WARN
--- New ---
[WARN] FS_DEGRADED: 1 filesystem is degraded
fs cephfs is degraded
--- Cleared ---
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.default.cephmon-02.duujba(mds.1): Client
apollo-10:cephfs_user failing to respond to cache pressure client_id:
1962074
=== Full health status ===
[WARN] FS_DEGRADED: 1 filesystem is degraded
fs cephfs is degraded
Then one after another MDS was going to error state:
HEALTH_WARN
--- Updated ---
[WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error
state
daemon mds.default.cephmon-02.duujba on cephmon-02 is in error
state
daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error
state
daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error
state
=== Full health status ===
[WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error
state
daemon mds.default.cephmon-02.duujba on cephmon-02 is in error
state
daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error
state
daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error
state
[WARN] FS_DEGRADED: 1 filesystem is degraded
fs cephfs is degraded
[WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
available
have 0; want 1 more
In the morning then I tried to restart the MDS in error state but the
kept failing. I then reduced the number of active MDS to 1
ceph fs set cephfs max_mds 1
And set the filesystem down
ceph fs set cephfs down true
I tried to restart the MDS again but now I'm stuck at the following
status:
[root@ceph01-b ~]# ceph -s
cluster:
id: aae23c5c-a98b-11ee-b44d-00620b05cac4
health: HEALTH_WARN
4 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w)
mgr: cephmon-01.dsxcho(active, since 11w), standbys:
cephmon-02.nssigg, cephmon-03.rgefle
mds: 3/3 daemons up
osd: 336 osds: 336 up (since 11w), 336 in (since 3M)
data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 6401 pgs
objects: 284.69M objects, 623 TiB
usage: 889 TiB used, 3.1 PiB / 3.9 PiB avail
pgs: 6186 active+clean
156 active+clean+scrubbing
59 active+clean+scrubbing+deep
[root@ceph01-b ~]# ceph health detail
HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded;
insufficient standby MDS daemons available
[WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state
daemon mds.default.cephmon-02.duujba on cephmon-02 is in unknown
state
daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state
daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state
[WRN] FS_DEGRADED: 1 filesystem is degraded
fs cephfs is degraded
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
available
have 0; want 1 more
[root@ceph01-b ~]#
[root@ceph01-b ~]# ceph fs status
cephfs - 40 clients
======
RANK STATE MDS ACTIVITY DNS INOS
DIRS CAPS
0 resolve default.cephmon-02.nyfook 12.3k 11.8k
3228 0
1 replay(laggy) default.cephmon-02.duujba 0 0 0 0
2 resolve default.cephmon-01.pvnqad 15.8k 3541
1409 0
POOL TYPE USED AVAIL
ssd-rep-metadata-pool metadata 295G 63.5T
sdd-rep-data-pool data 10.2T 84.6T
hdd-ec-data-pool data 808T 1929T
MDS version: ceph version 18.2.2
(531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
The end log file of the replay(laggy) default.cephmon-02.duujba shows:
[...]
-11> 2024-06-19T07:12:38.980+0000 7f90fd117700 1
mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672
(header had 8623488918). recovered.
-10> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue
operator(): open complete
-9> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue
operator(): recovering write_pos
-8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient:
get_auth_request con 0x55a93ef42c00 auth_method 0
-7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient:
get_auth_request con 0x55a93ef43400 auth_method 0
-6> 2024-06-19T07:12:39.038+0000 7f90fd117700 4 mds.1.purge_queue
_recover: write_pos recovered
-5> 2024-06-19T07:12:39.038+0000 7f90fd117700 1
mds.1.journaler.pq(ro) set_writeable
-4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient:
get_auth_request con 0x55a93ef43c00 auth_method 0
-3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient:
get_auth_request con 0x55a93ed97000 auth_method 0
-2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient:
get_auth_request con 0x55a93e903c00 auth_method 0
-1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
In function 'void interval_set<T, C>::erase(T, T,
std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread
7f90fa912700 time 2024-06-19T07:12:39.235633+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
568: FAILED ceph_assert(p->first <= start)
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x135) [0x7f910c722e15]
2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
9: clone()
0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal
(Aborted) **
in thread 7f90fa912700 thread_name:md_log_replay
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
(stable)
1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x18f) [0x7f910c722e6f]
5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
7: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
12: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.