mds crashes after up:replay state

Lars Köppel <lars.koeppel@xxxxxxxxxx> · Fri, 5 Jan 2024 15:52:48 +0100

Hello everyone,

we are running a small cluster with 3 nodes and 25 osds per node. And Ceph
version 17.2.6.
Recently the active mds crashed and since then the new starting mds has
always been in the up:replay state. In the output of the command 'ceph tell
mds.cephfs:0 status' you can see that the journal is completely read in. As
soon as it's finished, the mds crashes and the next one starts reading the
journal.

At the moment I have the journal inspection running ('cephfs-journal-tool
--rank=cephfs:0 journal inspect').

Does anyone have any further suggestions on how I can get the cluster
running again as quickly as possible?

Best regards
Lars

[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koeppel@xxxxxxxxxx
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx