Cephfs mds node already exists crashes mds

"Tarrago, Eli (RIS-BCT)" <Eli.Tarrago@xxxxxxxxxxxxxxxxxx> · Tue, 20 Aug 2024 15:20:28 +0000

Good Morning Ceph Users,

I’m currently engaged in troubleshooting an issue and I wanted to post here to get some feedback. If there is no response or feedback that this looks like a bug, then I’ll write up a bug report.

Cluster:
Reef 18.2.4
Ubuntu 20.04

ceph -s
  cluster:
    id:     93e49b2e-2b56-4faa-xxxx-88ceb615daac
    health: HEALTH_WARN
            1 filesystem is degraded
            49 large omap objects
            1 MDSs report oversized cache
            1 MDSs report slow requests
            152 pgs not deep-scrubbed in time
            14 daemons have recently crashed

  services:
    mon: 3 daemons, quorum mon01,mon03,mon02 (age 2w)
    mgr: mgr01(active, since 2w), standbys: mgr02, mgr03
    mds: 16/16 daemons up, 1 standby, 1 hot standby
    osd: 170 osds: 167 up (since 3d), 165 in (since 3d)
    rgw: 3 daemons active (3 hosts, 1 zones)

The error message received when mds crashes are:

Aug 18 09:14:32 node02 ceph-mds[4098736]: log_channel(cluster) log [ERR] : loaded dup inode 0x100009e4175 [2,head] v12 at /volume/db2/RDP/sql02/innodb/20240817.103010/idm_dw/tmp/text-base, but inode 0x100009e417>
Aug 18 09:14:32 node02 ceph-mds[4098736]: log_channel(cluster) log [ERR] : loaded dup inode 0x100009e4176 [2,head] v10 /volume/db2/RDP/sql02/innodb/20240817.103010/idm_dw/tmp/props, but inode 0x100009e4176.he>

Current status of cephfs for this volume

ceph fs status
cephfs - 93 clients
======
RANK     STATE          MDS         ACTIVITY     DNS    INOS   DIRS   CAPS
0    clientreplay  mds0206                  126M   126M   123k    73
1       active     mds0102  Reqs:    0 /s    10     13     11      0
2       active     mds0204  Reqs:    0 /s  12.9k  12.9k  63.8k    58
3       active     mds0203  Reqs:    0 /s  12.9k  12.9k  63.9k    56
4       active     mds0207  Reqs:    0 /s  12.7k  12.7k  63.8k     7
5       active     mds0201  Reqs:    0 /s  12.8k  12.8k  63.8k     3
6       active     mds0106  Reqs:    0 /s  12.8k  12.8k  63.8k     9
7       active     mds0105  Reqs:    0 /s  12.8k  12.8k  63.8k     4
8       active     mds0205  Reqs:    0 /s  12.4k  12.4k  61.7k     1
9       active     mds0208  Reqs:    0 /s  12.5k  12.5k  61.7k     0
10      active     mds0107  Reqs:    0 /s  12.5k  12.5k  61.7k     1
11      active     mds0104  Reqs:    0 /s  12.4k  12.5k  61.7k     0
12      active     mds0101  Reqs:    0 /s  12.5k  12.5k  61.7k     0
13      active     mds0103  Reqs:    0 /s  12.5k  12.5k  61.7k     0
14      active     mds0108  Reqs:    0 /s  12.4k  12.4k  61.7k     0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   232G   325T
  cephfs_data      data     439T   325T

This error cause mds0206, rank0 to restart.

This happened 14 times in the last 48 hours.

________________________________
The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx