Hi Patrick, thanks for your message, see my comments below.(BTW it seem that there is an issue with the ceph mailing list, my previous message did not go through yet, so this may be redundant)
On 6/19/24 17:27, Patrick Donnelly wrote:
Hi Dietmar, On Wed, Jun 19, 2024 at 3:44 AM Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> wrote:Hello cephers, we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to get it up again. We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby) It started this night, I got the first HEALTH_WARN emails saying: HEALTH_WARN --- New --- [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074 === Full health status === [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074 then it went on with: HEALTH_WARN --- New --- [WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded --- Cleared --- [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074 === Full health status === [WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded Then one after another MDS was going to error state: HEALTH_WARN --- Updated --- [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s) daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state === Full health status === [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s) daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state [WARN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded [WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available have 0; want 1 more In the morning then I tried to restart the MDS in error state but the kept failing. I then reduced the number of active MDS to 1 ceph fs set cephfs max_mds 1This will not have any positive effect.And set the filesystem down ceph fs set cephfs down true I tried to restart the MDS again but now I'm stuck at the following status:Setting the file system "down" wont' do anything here either. What were you trying to accomplish? Restarting the MDS may only add to your problems.[root@ceph01-b ~]# ceph -s cluster: id: aae23c5c-a98b-11ee-b44d-00620b05cac4 health: HEALTH_WARN 4 failed cephadm daemon(s) 1 filesystem is degraded insufficient standby MDS daemons available services: mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w) mgr: cephmon-01.dsxcho(active, since 11w), standbys: cephmon-02.nssigg, cephmon-03.rgefle mds: 3/3 daemons up osd: 336 osds: 336 up (since 11w), 336 in (since 3M) data: volumes: 0/1 healthy, 1 recovering pools: 4 pools, 6401 pgs objects: 284.69M objects, 623 TiB usage: 889 TiB used, 3.1 PiB / 3.9 PiB avail pgs: 6186 active+clean 156 active+clean+scrubbing 59 active+clean+scrubbing+deep [root@ceph01-b ~]# ceph health detail HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available [WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s) daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state daemon mds.default.cephmon-02.duujba on cephmon-02 is in unknown state daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state [WRN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available have 0; want 1 more [root@ceph01-b ~]# [root@ceph01-b ~]# ceph fs status cephfs - 40 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 resolve default.cephmon-02.nyfook 12.3k 11.8k 3228 0 1 replay(laggy) default.cephmon-02.duujba 0 0 0 0 2 resolve default.cephmon-01.pvnqad 15.8k 3541 1409 0 POOL TYPE USED AVAIL ssd-rep-metadata-pool metadata 295G 63.5T sdd-rep-data-pool data 10.2T 84.6T hdd-ec-data-pool data 808T 1929T MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) The end log file of the replay(laggy) default.cephmon-02.duujba shows: [...] -11> 2024-06-19T07:12:38.980+0000 7f90fd117700 1 mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header had 8623488918). recovered. -10> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue operator(): open complete -9> 2024-06-19T07:12:38.980+0000 7f90fd117700 4 mds.1.purge_queue operator(): recovering write_pos -8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ef42c00 auth_method 0 -7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93ef43400 auth_method 0 -6> 2024-06-19T07:12:39.038+0000 7f90fd117700 4 mds.1.purge_queue _recover: write_pos recovered -5> 2024-06-19T07:12:39.038+0000 7f90fd117700 1 mds.1.journaler.pq(ro) set_writeable -4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient: get_auth_request con 0x55a93ef43c00 auth_method 0 -3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ed97000 auth_method 0 -2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93e903c00 auth_method 0 -1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time 2024-06-19T07:12:39.235633+0000 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start) ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f910c722e15] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb] 3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5] 4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7] 5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81] 6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9] 7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1] 8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca] 9: clone()Suggest following the recommendations by Xiubo.
I ran the disaster recovery procedures now as suggested by Xiubo, as follows:
first I exported all the the journals, then I did[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type: OPEN: 8737 PURGED: 1 SESSION: 9 SESSIONS: 2 SUBTREEMAP: 128 TABLECLIENT: 2 TABLESERVER: 30 UPDATE: 9207 Errors: 0[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 event recover_dentries summary
Events by type: OPEN: 3 SESSION: 1 SUBTREEMAP: 34 UPDATE: 32965 Errors: 0[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 event recover_dentries summary
Events by type: OPEN: 5289 SESSION: 10 SESSIONS: 3 SUBTREEMAP: 128 UPDATE: 76448 Errors: 0 [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:all journal inspect Overall journal integrity: OK Overall journal integrity: DAMAGED Corrupt regions: 0xd9a84f243c-ffffffffffffffff Overall journal integrity: OK [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal inspect Overall journal integrity: OK [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal inspect Overall journal integrity: DAMAGED Corrupt regions: 0xd9a84f243c-ffffffffffffffff [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal inspect Overall journal integrity: OK [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal reset old journal was 879331755046~508520587 new journal start will be 879843344384 (3068751 bytes past old end) writing journal head writing EResetJournal entry done [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal reset old journal was 934711229813~120432327 new journal start will be 934834864128 (3201988 bytes past old end) writing journal head writing EResetJournal entry done [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal reset old journal was 1334153584288~252692691 new journal start will be 1334409428992 (3152013 bytes past old end) writing journal head writing EResetJournal entry done [root@ceph01-b /]# cephfs-table-tool all reset session { "0": { "data": {}, "result": 0 }, "1": { "data": {}, "result": 0 }, "2": { "data": {}, "result": 0 } } [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal inspect Overall journal integrity: OK [root@ceph01-b /]# ceph fs reset cephfs --yes-i-really-mean-it But now I hit the error below:-20> 2024-06-19T11:13:00.610+0000 7ff3694d0700 10 monclient: _send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0 -19> 2024-06-19T11:13:00.637+0000 7ff3664ca700 2 mds.0.cache Memory usage: total 485928, rss 170860, heap 207156, baseline 182580, 0 / 33434 inodes have caps, 0 caps, 0 caps per inode -18> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.default.cephmon-03.chjusj Updating MDS map to version 8061 from mon.1 -17> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 handle_mds_map i am now mds.0.8058 -16> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 handle_mds_map state change up:rejoin --> up:active -15> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 recovery_done -- successful recovery! -14> 2024-06-19T11:13:00.788+0000 7ff36a4d2700 1 mds.0.8058 active_start -13> 2024-06-19T11:13:00.789+0000 7ff36dcd9700 5 mds.beacon.default.cephmon-03.chjusj received beacon reply up:active seq 4 rtt 0.955007 -12> 2024-06-19T11:13:00.790+0000 7ff36a4d2700 1 mds.0.8058 cluster recovered. -11> 2024-06-19T11:13:00.790+0000 7ff36a4d2700 4 mds.0.8058 set_osd_epoch_barrier: epoch=33596 -10> 2024-06-19T11:13:00.790+0000 7ff3634c4700 5 mds.0.log _submit_thread 879843344432~2609 : EUpdate check_inode_max_size [metablob 0x100, 2 dirs] -9> 2024-06-19T11:13:00.791+0000 7ff3644c6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7ff3644c6700 time 2024-06-19T11:13:00.791580+0000 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: 1660: FAILED ceph_assert(follows >= realm->get_newest_seq())
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7ff374ad3e15]
2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227] 4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5] 5: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d] 6: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50]
7: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef] 8: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674] 9: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b] 10: (Context::complete(int)+0xd) [0x55da0a6775fd] 11: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd] 12: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca] 13: clone()-8> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client handle_log_ack log(last 7) v1 -7> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.647346+0000 mds.default.cephmon-03.chjusj (mds.0) 1 : cluster [ERR] loaded dup inode 0x10003e45d99 [415,head] v61632 at /home/balaz/.bash_history-54696.tmp, but inode 0x10003e45d99.head v61639 already exists at /home/balaz/.bash_history -6> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.648139+0000 mds.default.cephmon-03.chjusj (mds.0) 2 : cluster [ERR] loaded dup inode 0x10003e45d7c [415,head] v253612 at /home/rieder/.bash_history-10215.tmp, but inode 0x10003e45d7c.head v253630 already exists at /home/rieder/.bash_history -5> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.649483+0000 mds.default.cephmon-03.chjusj (mds.0) 3 : cluster [ERR] loaded dup inode 0x10003e45d83 [415,head] v164103 at /home/gottschling/.bash_history-44802.tmp, but inode 0x10003e45d83.head v164112 already exists at /home/gottschling/.bash_history -4> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.656221+0000 mds.default.cephmon-03.chjusj (mds.0) 4 : cluster [ERR] bad backtrace on directory inode 0x10003e42340 -3> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.737282+0000 mds.default.cephmon-03.chjusj (mds.0) 5 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b -2> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.804984+0000 mds.default.cephmon-03.chjusj (mds.0) 6 : cluster [ERR] bad backtrace on directory inode 0x10003e45d9f -1> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.805078+0000 mds.default.cephmon-03.chjusj (mds.0) 7 : cluster [ERR] bad backtrace on directory inode 0x10003e45d90 0> 2024-06-19T11:13:00.792+0000 7ff3644c6700 -1 *** Caught signal (Aborted) **
in thread 7ff3644c6700 thread_name:MR_Finisherceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
1: /lib64/libpthread.so.0(+0x12d20) [0x7ff373883d20] 2: gsignal() 3: abort()4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7ff374ad3e6f]
5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227] 7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5] 8: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d] 9: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50]
10: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef] 11: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674] 12: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b] 13: (Context::complete(int)+0xd) [0x55da0a6775fd] 14: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd] 15: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca] 16: clone()NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 ceph_exporter 1/ 5 memstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7ff362cc3700 / 7ff3634c4700 / md_submit 7ff363cc5700 / 7ff3644c6700 / MR_Finisher 7ff3654c8700 / PQ_Finisher 7ff365cc9700 / mds_rank_progr 7ff3664ca700 / ms_dispatch 7ff3684ce700 / ceph-mds 7ff3694d0700 / safe_timer 7ff36a4d2700 / ms_dispatch 7ff36b4d4700 / io_context_pool 7ff36c4d6700 / admin_socket 7ff36ccd7700 / msgr-worker-2 7ff36d4d8700 / msgr-worker-1 7ff36dcd9700 / msgr-worker-0 7ff375c9bb00 / ceph-mds max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.default.cephmon-03.chjusj.log --- end dump of recent events --- Any idea? Thanks Dietmar
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx