Hi Xiubo, On 6/19/24 09:55, Xiubo Li wrote:
Hi Dietmar, On 6/19/24 15:43, Dietmar Rieder wrote:Hello cephers,we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to get it up again.We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby) It started this night, I got the first HEALTH_WARN emails saying: HEALTH_WARN --- New --- [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressuremds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074=== Full health status === [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressuremds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074then it went on with: HEALTH_WARN --- New --- [WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded --- Cleared --- [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressuremds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074=== Full health status === [WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded Then one after another MDS was going to error state: HEALTH_WARN --- Updated --- [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state=== Full health status === [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state[WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded[WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons availablehave 0; want 1 moreIn the morning then I tried to restart the MDS in error state but the kept failing. I then reduced the number of active MDS to 1ceph fs set cephfs max_mds 1 And set the filesystem down ceph fs set cephfs down trueI tried to restart the MDS again but now I'm stuck at the following status:[root@ceph01-b ~]# ceph -s cluster: id: aae23c5c-a98b-11ee-b44d-00620b05cac4 health: HEALTH_WARN 4 failed cephadm daemon(s) 1 filesystem is degraded insufficient standby MDS daemons available services: mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w)mgr: cephmon-01.dsxcho(active, since 11w), standbys: cephmon-02.nssigg, cephmon-03.rgeflemds: 3/3 daemons up osd: 336 osds: 336 up (since 11w), 336 in (since 3M) data: volumes: 0/1 healthy, 1 recovering pools: 4 pools, 6401 pgs objects: 284.69M objects, 623 TiB usage: 889 TiB used, 3.1 PiB / 3.9 PiB avail pgs: 6186 active+clean 156 active+clean+scrubbing 59 active+clean+scrubbing+deep [root@ceph01-b ~]# ceph health detailHEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available[WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s) daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error statedaemon mds.default.cephmon-02.duujba on cephmon-02 is in unknown statedaemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state [WRN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons availablehave 0; want 1 more [root@ceph01-b ~]# [root@ceph01-b ~]# ceph fs status cephfs - 40 clients ======RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 resolve default.cephmon-02.nyfook 12.3k 11.8k 3228 01 replay(laggy) default.cephmon-02.duujba 0 0 0 02 resolve default.cephmon-01.pvnqad 15.8k 3541 1409 0POOL TYPE USED AVAIL ssd-rep-metadata-pool metadata 295G 63.5T sdd-rep-data-pool data 10.2T 84.6T hdd-ec-data-pool data 808T 1929TMDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)The end log file of the replay(laggy) default.cephmon-02.duujba shows: [...]-11> 2024-06-19T07:12:38.980+0000 7f90fd117700 1 mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header had 8623488918). recovered. -10> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue operator(): open complete -9> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue operator(): recovering write_pos -8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ef42c00 auth_method 0 -7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93ef43400 auth_method 0 -6> 2024-06-19T07:12:39.038+0000 7f90fd117700 4 mds.1.purge_queue _recover: write_pos recovered -5> 2024-06-19T07:12:39.038+0000 7f90fd117700 1 mds.1.journaler.pq(ro) set_writeable -4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient: get_auth_request con 0x55a93ef43c00 auth_method 0 -3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ed97000 auth_method 0 -2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93e903c00 auth_method 0 -1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time 2024-06-19T07:12:39.235633+0000/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f910c722e15]2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5] 4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81] 6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9] 7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1] 8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca] 9: clone()0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) **in thread 7f90fa912700 thread_name:md_log_replayceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20] 2: gsignal() 3: abort()4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f]5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5] 7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81] 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9] 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1] 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca] 12: clone()NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.This is a known bug, please see https://tracker.ceph.com/issues/61009.As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26
I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs?
Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before?
Sorry for the lot of (naive) questions, but I do not want to make any mistake here.
Thanks for your support, Dietmar
--- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 ceph_exporter 1/ 5 memstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7f90fa912700 / md_log_replay 7f90fb914700 / 7f90fc115700 / MR_Finisher 7f90fd117700 / PQ_Finisher 7f90fe119700 / ms_dispatch 7f910011d700 / ceph-mds 7f9102121700 / ms_dispatch 7f9103123700 / io_context_pool 7f9104125700 / admin_socket 7f9104926700 / msgr-worker-2 7f9105127700 / msgr-worker-1 7f9105928700 / msgr-worker-0 7f910d8eab00 / ceph-mds max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log --- end dump of recent events --- I have no idea how to resolve this and would be grateful for any help. Dietmar _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx