Re: Mds daemon damaged - assert failed

Eugen Block <eblock@xxxxxx> · Fri, 27 Sep 2024 06:19:03 +0000

It could be a bug, sure, but I haven't searched tracker too long,  
maybe there is an existing bug, I'd leave it to the devs to comment on  
that. But the assert alone isn't of much help (to me), more mds logs  
could help track this down.

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

On Sep 25, 2024, at 1:05 AM, Eugen Block <eblock@xxxxxx> wrote:

Great that you got your filesystem back.

cephfs-journal-tool journal export
cephfs-journal-tool event recover_dentries summary

Both failed

Your export command seems to be missing the output file, or was it  
not the exact command?

Yes I didn’t include the output file in my snippet.  Sorry for the  
confusion.  But the command did in fact complain that the journal  
was corrupted.

Also, I understand that the metadata itself is sitting on the  
disk, but it looks like a single point of failure.  What’s the  
logic behind having a simple metadata location, but multiple mds  
servers?

I think there's a misunderstanding, the metadata is in the cephfs  
metadata pool, not on the local disk of your machine.

By “disk” I meant the concept of permanent storage, ie. Ceph.  Yes,  
our understanding matches.  But the question still remains, as to  
why that assert would trigger.  Is it because of a software issue  
(bug?) that caused the journal to be corrupted, or something else  
corrupted the journal that caused the MDS to throw the assertion?   
Basically, I’m trying to find what could be a possible root-cause..

Thank you!

George

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

I managed to recover my filesystem.

cephfs-journal-tool journal export
cephfs-journal-tool event recover_dentries summary

Both failed

But truncating the journal and following some of the instructions  
in  
https://people.redhat.com/bhubbard/nature/default/cephfs/disaster-recovery-experts/ helped me to get the mds  
up.

Then I scrubbed and repaired the filesystem, and I “believe” I’m  
back in business.

What is weird though is that an assert failed as shown in the  
stack dump below.  Was that a legitimate assertion that indicates  
a bigger issue, or was it a false assertion?

Also, I understand that the metadata itself is sitting on the  
disk, but it looks like a single point of failure.  What’s the  
logic behind having a simple metadata location, but multiple mds  
servers?

Thanks!

George

On Sep 24, 2024, at 5:55 AM, Eugen Block <eblock@xxxxxx> wrote:

Hi,

I would probably start by inspecting the journal with the  
cephfs-journal-tool [0]:

cephfs-journal-tool [--rank=<fs_name>:{mds-rank|all}] journal inspect

And it could be helful to have the logs prior to the assert.

[0]  
https://docs.ceph.com/en/latest/cephfs/cephfs-journal-tool/#example-journal-inspect

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

Hello ceph users,

I am in the unfortunate situation of having a status of “1 mds  
daemon damaged”.  Looking at the logs, I see that the daemon died  
with an assert as follows:

./src/osdc/Journaler.cc: 1368: FAILED ceph_assert(trim_to > trimming_pos)

ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748)  
reef (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x12a) [0x73a83189d7d9]
2: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
3: (Journaler::_trim()+0x671) [0x57235caa70b1]
4: (Journaler::_finish_write_head(int, Journaler::Header&,  
C_OnFinisher*)+0x171) [0x57235caaa8f1]
5: (Context::complete(int)+0x9) [0x57235c716849]
6: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]

   0> 2024-09-23T14:10:26.490-0500 73a822c006c0 -1 *** Caught  
signal (Aborted) **
in thread 73a822c006c0 thread_name:MR_Finisher

ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748)  
reef (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x73a83105b050]
2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x73a8310a9e2c]
3: gsignal()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x185) [0x73a83189d834]
6: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
7: (Journaler::_trim()+0x671) [0x57235caa70b1]
8: (Journaler::_finish_write_head(int, Journaler::Header&,  
C_OnFinisher*)+0x171) [0x57235caaa8f1]
9: (Context::complete(int)+0x9) [0x57235c716849]
10: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is  
needed to interpret this.

As listed above, I am running 18.2.2 on a proxmox cluster with a  
hybrid hdd/sdd setup.  2 cephfs filesystems.  The mds responsible  
for the hdd filesystem is the one that died.

Output of ceph -s follows:

root@vis-mgmt:~/bin# ceph -s
cluster:
  id:     ec2c9542-dc1b-4af6-9f21-0adbcabb9452
  health: HEALTH_ERR
          1 filesystem is degraded
          1 filesystem is offline
          1 mds daemon damaged
          5 pgs not scrubbed in time
          1 daemons have recently crashed
  services:
  mon: 5 daemons, quorum  
vis-hsw-01,vis-skx-01,vis-clx-15,vis-clx-04,vis-icx-00 (age 6m)
  mgr: vis-hsw-02(active, since 13d), standbys: vis-skx-02,  
vis-hsw-04, vis-clx-08, vis-clx-02
  mds: 1/2 daemons up, 5 standby
  osd: 97 osds: 97 up (since 3h), 97 in (since 4d)
  data:
  volumes: 1/2 healthy, 1 recovering; 1 damaged
  pools:   14 pools, 1961 pgs
  objects: 223.70M objects, 304 TiB
  usage:   805 TiB used, 383 TiB / 1.2 PiB avail
  pgs:     1948 active+clean
           9    active+clean+scrubbing+deep
           4    active+clean+scrubbing
  io:
  client:   86 KiB/s rd, 5.5 MiB/s wr, 64 op/s rd, 26 op/s wr

I tried restarting all the mds deamons but they are all marked as  
“standby”.  I also tried restarting all the mons and then the mds  
daemons again, but that didn’t help.

Much help is appreciated!

Thank you!

George

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx