Re: [EXTERN] Re: Urgent help with degraded filesystem needed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Xiubo,


On 6/19/24 09:55, Xiubo Li wrote:
Hi Dietmar,

On 6/19/24 15:43, Dietmar Rieder wrote:
Hello cephers,

we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to get it up again.

We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby)

It started this night, I got the first HEALTH_WARN emails saying:

HEALTH_WARN

--- New ---
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
        mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074


=== Full health status ===
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
        mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074


then it went on with:

HEALTH_WARN

--- New ---
[WARN] FS_DEGRADED: 1 filesystem is degraded
        fs cephfs is degraded

--- Cleared ---
[WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
        mds.default.cephmon-02.duujba(mds.1): Client apollo-10:cephfs_user failing to respond to cache pressure client_id: 1962074


=== Full health status ===
[WARN] FS_DEGRADED: 1 filesystem is degraded
        fs cephfs is degraded



Then one after another MDS was going to error state:

HEALTH_WARN

--- Updated ---
[WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
        daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state         daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state         daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state         daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state


=== Full health status ===
[WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
        daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state         daemon mds.default.cephmon-02.duujba on cephmon-02 is in error state         daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state         daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state
[WARN] FS_DEGRADED: 1 filesystem is degraded
        fs cephfs is degraded
[WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
        have 0; want 1 more


In the morning then I tried to restart the MDS in error state but the kept failing. I then reduced the number of active MDS to 1

ceph fs set cephfs max_mds 1

And set the filesystem down

ceph fs set cephfs down true

I tried to restart the MDS again but now I'm stuck at the following status:


[root@ceph01-b ~]# ceph -s
  cluster:
    id:     aae23c5c-a98b-11ee-b44d-00620b05cac4
    health: HEALTH_WARN
            4 failed cephadm daemon(s)
            1 filesystem is degraded
            insufficient standby MDS daemons available

  services:
    mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w)
    mgr: cephmon-01.dsxcho(active, since 11w), standbys: cephmon-02.nssigg, cephmon-03.rgefle
    mds: 3/3 daemons up
    osd: 336 osds: 336 up (since 11w), 336 in (since 3M)

  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   4 pools, 6401 pgs
    objects: 284.69M objects, 623 TiB
    usage:   889 TiB used, 3.1 PiB / 3.9 PiB avail
    pgs:     6186 active+clean
             156  active+clean+scrubbing
             59   active+clean+scrubbing+deep

[root@ceph01-b ~]# ceph health detail
HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available
[WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
    daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error state
    daemon mds.default.cephmon-02.duujba on cephmon-02 is in unknown state
    daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error state
    daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error state
[WRN] FS_DEGRADED: 1 filesystem is degraded
    fs cephfs is degraded
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
    have 0; want 1 more
[root@ceph01-b ~]#
[root@ceph01-b ~]# ceph fs status
cephfs - 40 clients
======
RANK      STATE                 MDS             ACTIVITY   DNS INOS DIRS   CAPS  0       resolve     default.cephmon-02.nyfook            12.3k 11.8k 3228      0
 1    replay(laggy)  default.cephmon-02.duujba 0      0    0      0
 2       resolve     default.cephmon-01.pvnqad            15.8k 3541 1409      0
         POOL            TYPE     USED  AVAIL
ssd-rep-metadata-pool  metadata   295G  63.5T
  sdd-rep-data-pool      data    10.2T  84.6T
   hdd-ec-data-pool      data     808T  1929T
MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)


The end log file of the  replay(laggy)  default.cephmon-02.duujba shows:

[...]
   -11> 2024-06-19T07:12:38.980+0000 7f90fd117700  1 mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header had 8623488918). recovered.    -10> 2024-06-19T07:12:38.980+0000 7f90fd117700  4 mds.1.purge_queue operator(): open complete     -9> 2024-06-19T07:12:38.980+0000 7f90fd117700  4 mds.1.purge_queue operator(): recovering write_pos     -8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ef42c00 auth_method 0     -7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93ef43400 auth_method 0     -6> 2024-06-19T07:12:39.038+0000 7f90fd117700  4 mds.1.purge_queue _recover: write_pos recovered     -5> 2024-06-19T07:12:39.038+0000 7f90fd117700  1 mds.1.journaler.pq(ro) set_writeable     -4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient: get_auth_request con 0x55a93ef43c00 auth_method 0     -3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient: get_auth_request con 0x55a93ed97000 auth_method 0     -2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient: get_auth_request con 0x55a93e903c00 auth_method 0     -1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time 2024-06-19T07:12:39.235633+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f910c722e15]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
 3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]  4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
 5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
 6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
 7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
 8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
 9: clone()

     0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) **
 in thread 7f90fa912700 thread_name:md_log_replay

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
 1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]  7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
 8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is a known bug, please see https://tracker.ceph.com/issues/61009.

As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26

I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs?

Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before?

Sorry for the lot of (naive) questions, but I do not want to make any mistake here.

Thanks for your support,

Dietmar

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/ 5 rgw_access
   1/ 5 rgw_dbstore
   1/ 5 rgw_flight
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_t
   0/ 5 seastore_cleaner
   0/ 5 seastore_epm
   0/ 5 seastore_lba
   0/ 5 seastore_fixedkv_tree
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 seastore_backref
   0/ 5 alienstore
   1/ 5 mclock
   0/ 5 cyanstore
   1/ 5 ceph_exporter
   1/ 5 memstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7f90fa912700 / md_log_replay
  7f90fb914700 /
  7f90fc115700 / MR_Finisher
  7f90fd117700 / PQ_Finisher
  7f90fe119700 / ms_dispatch
  7f910011d700 / ceph-mds
  7f9102121700 / ms_dispatch
  7f9103123700 / io_context_pool
  7f9104125700 / admin_socket
  7f9104926700 / msgr-worker-2
  7f9105127700 / msgr-worker-1
  7f9105928700 / msgr-worker-0
  7f910d8eab00 / ceph-mds
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log
--- end dump of recent events ---


I have no idea how to resolve this and would be grateful for any help.

Dietmar

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux