Re: Urgent help with degraded filesystem needed

Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> · Wed, 19 Jun 2024 10:02:05 +0200

Hi Dietmar,

have you already blocked all cephfs clients?

Joachim

*Joachim Kraftmayer*
CEO | p: +49 89 2152527-21 | e: joachim.kraftmayer@xxxxxxxxx

a: Loristr. 8 | 80335 Munich | Germany | w: https://clyso.com |
Utting a. A. | HR: Augsburg | HRB 25866 | USt. ID: DE275430677

Am Mi., 19. Juni 2024 um 09:44 Uhr schrieb Dietmar Rieder <
dietmar.rieder@xxxxxxxxxxx>:

> Hello cephers,
>
> we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to
> get it up again.
>
> We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby)
>
> It started this night, I got the first HEALTH_WARN emails saying:
>
> HEALTH_WARN
>
> --- New ---
> [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>          mds.default.cephmon-02.duujba(mds.1): Client
> apollo-10:cephfs_user failing to respond to cache pressure client_id:
> 1962074
>
>
> === Full health status ===
> [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>          mds.default.cephmon-02.duujba(mds.1): Client
> apollo-10:cephfs_user failing to respond to cache pressure client_id:
> 1962074
>
>
> then it went on with:
>
> HEALTH_WARN
>
> --- New ---
> [WARN] FS_DEGRADED: 1 filesystem is degraded
>          fs cephfs is degraded
>
> --- Cleared ---
> [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>          mds.default.cephmon-02.duujba(mds.1): Client
> apollo-10:cephfs_user failing to respond to cache pressure client_id:
> 1962074
>
>
> === Full health status ===
> [WARN] FS_DEGRADED: 1 filesystem is degraded
>          fs cephfs is degraded
>
>
>
> Then one after another MDS was going to error state:
>
> HEALTH_WARN
>
> --- Updated ---
> [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
>          daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error
> state
>          daemon mds.default.cephmon-02.duujba on cephmon-02 is in error
> state
>          daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error
> state
>          daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error
> state
>
>
> === Full health status ===
> [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
>          daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error
> state
>          daemon mds.default.cephmon-02.duujba on cephmon-02 is in error
> state
>          daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error
> state
>          daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error
> state
> [WARN] FS_DEGRADED: 1 filesystem is degraded
>          fs cephfs is degraded
> [WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
>          have 0; want 1 more
>
>
> In the morning then I tried to restart the MDS in error state but the
> kept failing. I then reduced the number of active MDS to 1
>
> ceph fs set cephfs max_mds 1
>
> And set the filesystem down
>
> ceph fs set cephfs down true
>
> I tried to restart the MDS again but now I'm stuck at the following status:
>
>
> [root@ceph01-b ~]# ceph -s
>    cluster:
>      id:     aae23c5c-a98b-11ee-b44d-00620b05cac4
>      health: HEALTH_WARN
>              4 failed cephadm daemon(s)
>              1 filesystem is degraded
>              insufficient standby MDS daemons available
>
>    services:
>      mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w)
>      mgr: cephmon-01.dsxcho(active, since 11w), standbys:
> cephmon-02.nssigg, cephmon-03.rgefle
>      mds: 3/3 daemons up
>      osd: 336 osds: 336 up (since 11w), 336 in (since 3M)
>
>    data:
>      volumes: 0/1 healthy, 1 recovering
>      pools:   4 pools, 6401 pgs
>      objects: 284.69M objects, 623 TiB
>      usage:   889 TiB used, 3.1 PiB / 3.9 PiB avail
>      pgs:     6186 active+clean
>               156  active+clean+scrubbing
>               59   active+clean+scrubbing+deep
>
> [root@ceph01-b ~]# ceph health detail
> HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded;
> insufficient standby MDS daemons available
> [WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
>      daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state
>      daemon mds.default.cephmon-02.duujba on cephmon-02 is in unknown state
>      daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state
>      daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state
> [WRN] FS_DEGRADED: 1 filesystem is degraded
>      fs cephfs is degraded
> [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
>      have 0; want 1 more
> [root@ceph01-b ~]#
> [root@ceph01-b ~]# ceph fs status
> cephfs - 40 clients
> ======
> RANK      STATE                 MDS             ACTIVITY   DNS    INOS
> DIRS   CAPS
>   0       resolve     default.cephmon-02.nyfook            12.3k  11.8k
> 3228      0
>   1    replay(laggy)  default.cephmon-02.duujba               0      0
>     0      0
>   2       resolve     default.cephmon-01.pvnqad            15.8k  3541
> 1409      0
>           POOL            TYPE     USED  AVAIL
> ssd-rep-metadata-pool  metadata   295G  63.5T
>    sdd-rep-data-pool      data    10.2T  84.6T
>     hdd-ec-data-pool      data     808T  1929T
> MDS version: ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>
>
> The end log file of the  replay(laggy)  default.cephmon-02.duujba shows:
>
> [...]
>     -11> 2024-06-19T07:12:38.980+0000 7f90fd117700  1
> mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header
> had 8623488918). recovered.
>     -10> 2024-06-19T07:12:38.980+0000 7f90fd117700  4 mds.1.purge_queue
> operator(): open complete
>      -9> 2024-06-19T07:12:38.980+0000 7f90fd117700  4 mds.1.purge_queue
> operator(): recovering write_pos
>      -8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient:
> get_auth_request con 0x55a93ef42c00 auth_method 0
>      -7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient:
> get_auth_request con 0x55a93ef43400 auth_method 0
>      -6> 2024-06-19T07:12:39.038+0000 7f90fd117700  4 mds.1.purge_queue
> _recover: write_pos recovered
>      -5> 2024-06-19T07:12:39.038+0000 7f90fd117700  1
> mds.1.journaler.pq(ro) set_writeable
>      -4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient:
> get_auth_request con 0x55a93ef43c00 auth_method 0
>      -3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient:
> get_auth_request con 0x55a93ed97000 auth_method 0
>      -2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient:
> get_auth_request con 0x55a93e903c00 auth_method 0
>      -1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
>
> In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T,
> T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time
> 2024-06-19T07:12:39.235633+0000
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
>
> 568: FAILED ceph_assert(p->first <= start)
>
>   ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x135) [0x7f910c722e15]
>   2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
>   3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
>   4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
> MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
>   5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
>   6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
>   7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
>   8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
>   9: clone()
>
>       0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal
> (Aborted) **
>   in thread 7f90fa912700 thread_name:md_log_replay
>
>   ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
> (stable)
>   1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
>   2: gsignal()
>   3: abort()
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x18f) [0x7f910c722e6f]
>   5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
>   6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
>   7: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
> MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
>   8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
>   9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
>   10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
>   11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
>   12: clone()
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>     0/ 5 none
>     0/ 1 lockdep
>     0/ 1 context
>     1/ 1 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 1 buffer
>     0/ 1 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/ 5 rbd_mirror
>     0/ 5 rbd_replay
>     0/ 5 rbd_pwl
>     0/ 5 journaler
>     0/ 5 objectcacher
>     0/ 5 immutable_obj_cache
>     0/ 5 client
>     1/ 5 osd
>     0/ 5 optracker
>     0/ 5 objclass
>     1/ 3 filestore
>     1/ 3 journal
>     0/ 0 ms
>     1/ 5 mon
>     0/10 monc
>     1/ 5 paxos
>     0/ 5 tp
>     1/ 5 auth
>     1/ 5 crypto
>     1/ 1 finisher
>     1/ 1 reserver
>     1/ 5 heartbeatmap
>     1/ 5 perfcounter
>     1/ 5 rgw
>     1/ 5 rgw_sync
>     1/ 5 rgw_datacache
>     1/ 5 rgw_access
>     1/ 5 rgw_dbstore
>     1/ 5 rgw_flight
>     1/ 5 javaclient
>     1/ 5 asok
>     1/ 1 throttle
>     0/ 0 refs
>     1/ 5 compressor
>     1/ 5 bluestore
>     1/ 5 bluefs
>     1/ 3 bdev
>     1/ 5 kstore
>     4/ 5 rocksdb
>     4/ 5 leveldb
>     1/ 5 fuse
>     2/ 5 mgr
>     1/ 5 mgrc
>     1/ 5 dpdk
>     1/ 5 eventtrace
>     1/ 5 prioritycache
>     0/ 5 test
>     0/ 5 cephfs_mirror
>     0/ 5 cephsqlite
>     0/ 5 seastore
>     0/ 5 seastore_onode
>     0/ 5 seastore_odata
>     0/ 5 seastore_omap
>     0/ 5 seastore_tm
>     0/ 5 seastore_t
>     0/ 5 seastore_cleaner
>     0/ 5 seastore_epm
>     0/ 5 seastore_lba
>     0/ 5 seastore_fixedkv_tree
>     0/ 5 seastore_cache
>     0/ 5 seastore_journal
>     0/ 5 seastore_device
>     0/ 5 seastore_backref
>     0/ 5 alienstore
>     1/ 5 mclock
>     0/ 5 cyanstore
>     1/ 5 ceph_exporter
>     1/ 5 memstore
>    -2/-2 (syslog threshold)
>    -1/-1 (stderr threshold)
> --- pthread ID / name mapping for recent threads ---
>    7f90fa912700 / md_log_replay
>    7f90fb914700 /
>    7f90fc115700 / MR_Finisher
>    7f90fd117700 / PQ_Finisher
>    7f90fe119700 / ms_dispatch
>    7f910011d700 / ceph-mds
>    7f9102121700 / ms_dispatch
>    7f9103123700 / io_context_pool
>    7f9104125700 / admin_socket
>    7f9104926700 / msgr-worker-2
>    7f9105127700 / msgr-worker-1
>    7f9105928700 / msgr-worker-0
>    7f910d8eab00 / ceph-mds
>    max_recent     10000
>    max_new         1000
>    log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log
> --- end dump of recent events ---
>
>
> I have no idea how to resolve this and would be grateful for any help.
>
> Dietmar
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx