Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Wed, 19 Jun 2024 10:05:08 +0200

Hi Joachim,

I suppose that setting the filesystem down will block all clients:

ceph fs set cephfs down true

right?

Dietmar

On 6/19/24 10:02, Joachim Kraftmayer wrote:
Hi Dietmar,

have you already blocked all cephfs clients?

Joachim

*Joachim Kraftmayer*
CEO | p: +49 89 2152527-21 | e: joachim.kraftmayer@xxxxxxxxx 
<mailto:joachim.kraftmayer@xxxxxxxxx>

a: Loristr. 8 | 80335 Munich | Germany | w: https://clyso.com 
<https://clyso.com> |
Utting a. A. | HR: Augsburg | HRB 25866 | USt. ID: DE275430677

Am Mi., 19. Juni 2024 um 09:44 Uhr schrieb Dietmar Rieder 
<dietmar.rieder@xxxxxxxxxxx <mailto:dietmar.rieder@xxxxxxxxxxx>>:

    Hello cephers,

    we have a degraded filesystem on our ceph 18.2.2 cluster and I'd
    need to
    get it up again.

    We have 6 MDS daemons and (3 active, each pinned to a subtree, 3
    standby)

    It started this night, I got the first HEALTH_WARN emails saying:

    HEALTH_WARN

    --- New ---
    [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
              mds.default.cephmon-02.duujba(mds.1): Client
    apollo-10:cephfs_user failing to respond to cache pressure client_id:
    1962074

    === Full health status ===
    [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
              mds.default.cephmon-02.duujba(mds.1): Client
    apollo-10:cephfs_user failing to respond to cache pressure client_id:
    1962074

    then it went on with:

    HEALTH_WARN

    --- New ---
    [WARN] FS_DEGRADED: 1 filesystem is degraded
              fs cephfs is degraded

    --- Cleared ---
    [WARN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
              mds.default.cephmon-02.duujba(mds.1): Client
    apollo-10:cephfs_user failing to respond to cache pressure client_id:
    1962074

    === Full health status ===
    [WARN] FS_DEGRADED: 1 filesystem is degraded
              fs cephfs is degraded

    Then one after another MDS was going to error state:

    HEALTH_WARN

    --- Updated ---
    [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
              daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in
    error
    state
              daemon mds.default.cephmon-02.duujba on cephmon-02 is in
    error
    state
              daemon mds.default.cephmon-03.chjusj on cephmon-03 is in
    error
    state
              daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in
    error
    state

    === Full health status ===
    [WARN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
              daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in
    error
    state
              daemon mds.default.cephmon-02.duujba on cephmon-02 is in
    error
    state
              daemon mds.default.cephmon-03.chjusj on cephmon-03 is in
    error
    state
              daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in
    error
    state
    [WARN] FS_DEGRADED: 1 filesystem is degraded
              fs cephfs is degraded
    [WARN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
    available
              have 0; want 1 more

    In the morning then I tried to restart the MDS in error state but the
    kept failing. I then reduced the number of active MDS to 1

    ceph fs set cephfs max_mds 1

    And set the filesystem down

    ceph fs set cephfs down true

    I tried to restart the MDS again but now I'm stuck at the following
    status:

    [root@ceph01-b ~]# ceph -s
        cluster:
          id:     aae23c5c-a98b-11ee-b44d-00620b05cac4
          health: HEALTH_WARN
                  4 failed cephadm daemon(s)
                  1 filesystem is degraded
                  insufficient standby MDS daemons available

        services:
          mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 2w)
          mgr: cephmon-01.dsxcho(active, since 11w), standbys:
    cephmon-02.nssigg, cephmon-03.rgefle
          mds: 3/3 daemons up
          osd: 336 osds: 336 up (since 11w), 336 in (since 3M)

        data:
          volumes: 0/1 healthy, 1 recovering
          pools:   4 pools, 6401 pgs
          objects: 284.69M objects, 623 TiB
          usage:   889 TiB used, 3.1 PiB / 3.9 PiB avail
          pgs:     6186 active+clean
                   156  active+clean+scrubbing
                   59   active+clean+scrubbing+deep

    [root@ceph01-b ~]# ceph health detail
    HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded;
    insufficient standby MDS daemons available
    [WRN] CEPHADM_FAILED_DAEMON: 4 failed cephadm daemon(s)
          daemon mds.default.cephmon-01.cepqjp on cephmon-01 is in error
    state
          daemon mds.default.cephmon-02.duujba on cephmon-02 is in
    unknown state
          daemon mds.default.cephmon-03.chjusj on cephmon-03 is in error
    state
          daemon mds.default.cephmon-03.xcujhz on cephmon-03 is in error
    state
    [WRN] FS_DEGRADED: 1 filesystem is degraded
          fs cephfs is degraded
    [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
    available
          have 0; want 1 more
    [root@ceph01-b ~]#
    [root@ceph01-b ~]# ceph fs status
    cephfs - 40 clients
    ======
    RANK      STATE                 MDS             ACTIVITY   DNS    INOS
    DIRS   CAPS
       0       resolve     default.cephmon-02.nyfook            12.3k 
    11.8k
    3228      0
       1    replay(laggy)  default.cephmon-02.duujba               0      0
         0      0
       2       resolve     default.cephmon-01.pvnqad            15.8k  3541
    1409      0
               POOL            TYPE     USED  AVAIL
    ssd-rep-metadata-pool  metadata   295G  63.5T
        sdd-rep-data-pool      data    10.2T  84.6T
         hdd-ec-data-pool      data     808T  1929T
    MDS version: ceph version 18.2.2
    (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)

    The end log file of the  replay(laggy)  default.cephmon-02.duujba shows:

    [...]
         -11> 2024-06-19T07:12:38.980+0000 7f90fd117700  1
    mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header
    had 8623488918). recovered.
         -10> 2024-06-19T07:12:38.980+0000 7f90fd117700  4
    mds.1.purge_queue
    operator(): open complete
          -9> 2024-06-19T07:12:38.980+0000 7f90fd117700  4
    mds.1.purge_queue
    operator(): recovering write_pos
          -8> 2024-06-19T07:12:39.015+0000 7f9104926700 10 monclient:
    get_auth_request con 0x55a93ef42c00 auth_method 0
          -7> 2024-06-19T07:12:39.025+0000 7f9105928700 10 monclient:
    get_auth_request con 0x55a93ef43400 auth_method 0
          -6> 2024-06-19T07:12:39.038+0000 7f90fd117700  4
    mds.1.purge_queue
    _recover: write_pos recovered
          -5> 2024-06-19T07:12:39.038+0000 7f90fd117700  1
    mds.1.journaler.pq(ro) set_writeable
          -4> 2024-06-19T07:12:39.044+0000 7f9105127700 10 monclient:
    get_auth_request con 0x55a93ef43c00 auth_method 0
          -3> 2024-06-19T07:12:39.113+0000 7f9104926700 10 monclient:
    get_auth_request con 0x55a93ed97000 auth_method 0
          -2> 2024-06-19T07:12:39.123+0000 7f9105928700 10 monclient:
    get_auth_request con 0x55a93e903c00 auth_method 0
          -1> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1
    /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
    In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T,
    T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time
    2024-06-19T07:12:39.235633+0000
    /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
    568: FAILED ceph_assert(p->first <= start)

       ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
    (stable)
       1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x135) [0x7f910c722e15]
       2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
       3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
    std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
       4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
    MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
       5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
       6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
       7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
       8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
       9: clone()

           0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught
    signal
    (Aborted) **
       in thread 7f90fa912700 thread_name:md_log_replay

       ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
    (stable)
       1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
       2: gsignal()
       3: abort()
       4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x18f) [0x7f910c722e6f]
       5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
       6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
    std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
       7: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
    MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
       8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
       9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
       10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
       11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
       12: clone()
       NOTE: a copy of the executable, or `objdump -rdS <executable>` is
    needed to interpret this.

    --- logging levels ---
         0/ 5 none
         0/ 1 lockdep
         0/ 1 context
         1/ 1 crush
         1/ 5 mds
         1/ 5 mds_balancer
         1/ 5 mds_locker
         1/ 5 mds_log
         1/ 5 mds_log_expire
         1/ 5 mds_migrator
         0/ 1 buffer
         0/ 1 timer
         0/ 1 filer
         0/ 1 striper
         0/ 1 objecter
         0/ 5 rados
         0/ 5 rbd
         0/ 5 rbd_mirror
         0/ 5 rbd_replay
         0/ 5 rbd_pwl
         0/ 5 journaler
         0/ 5 objectcacher
         0/ 5 immutable_obj_cache
         0/ 5 client
         1/ 5 osd
         0/ 5 optracker
         0/ 5 objclass
         1/ 3 filestore
         1/ 3 journal
         0/ 0 ms
         1/ 5 mon
         0/10 monc
         1/ 5 paxos
         0/ 5 tp
         1/ 5 auth
         1/ 5 crypto
         1/ 1 finisher
         1/ 1 reserver
         1/ 5 heartbeatmap
         1/ 5 perfcounter
         1/ 5 rgw
         1/ 5 rgw_sync
         1/ 5 rgw_datacache
         1/ 5 rgw_access
         1/ 5 rgw_dbstore
         1/ 5 rgw_flight
         1/ 5 javaclient
         1/ 5 asok
         1/ 1 throttle
         0/ 0 refs
         1/ 5 compressor
         1/ 5 bluestore
         1/ 5 bluefs
         1/ 3 bdev
         1/ 5 kstore
         4/ 5 rocksdb
         4/ 5 leveldb
         1/ 5 fuse
         2/ 5 mgr
         1/ 5 mgrc
         1/ 5 dpdk
         1/ 5 eventtrace
         1/ 5 prioritycache
         0/ 5 test
         0/ 5 cephfs_mirror
         0/ 5 cephsqlite
         0/ 5 seastore
         0/ 5 seastore_onode
         0/ 5 seastore_odata
         0/ 5 seastore_omap
         0/ 5 seastore_tm
         0/ 5 seastore_t
         0/ 5 seastore_cleaner
         0/ 5 seastore_epm
         0/ 5 seastore_lba
         0/ 5 seastore_fixedkv_tree
         0/ 5 seastore_cache
         0/ 5 seastore_journal
         0/ 5 seastore_device
         0/ 5 seastore_backref
         0/ 5 alienstore
         1/ 5 mclock
         0/ 5 cyanstore
         1/ 5 ceph_exporter
         1/ 5 memstore
        -2/-2 (syslog threshold)
        -1/-1 (stderr threshold)
    --- pthread ID / name mapping for recent threads ---
        7f90fa912700 / md_log_replay
        7f90fb914700 /
        7f90fc115700 / MR_Finisher
        7f90fd117700 / PQ_Finisher
        7f90fe119700 / ms_dispatch
        7f910011d700 / ceph-mds
        7f9102121700 / ms_dispatch
        7f9103123700 / io_context_pool
        7f9104125700 / admin_socket
        7f9104926700 / msgr-worker-2
        7f9105127700 / msgr-worker-1
        7f9105928700 / msgr-worker-0
        7f910d8eab00 / ceph-mds
        max_recent     10000
        max_new         1000
        log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log
    --- end dump of recent events ---

    I have no idea how to resolve this and would be grateful for any help.

    Dietmar
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

--
_________________________________________________________
D i e t m a r  R i e d e r
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402 | Mobile: +43 676 8716 72402
Email: dietmar.rieder@xxxxxxxxxxx
Web:   http://www.icbi.at

Attachment:
OpenPGP_signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx