Hello Patrick,
On 27.11.23 19:05, Patrick Donnelly wrote:
I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:
ceph config set mds debug_mds 20
ceph config set mds debug_ms 1
[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/
Uploaded debug log and core dump, see ceph-post-file:
02f78445-7136-44c9-a362-410de37a0b7d
Unfortunately, we cannot easily shut down normal access to the cluster
for these tests, therefore there is quite some clutter in the logs. The
logs show three crashes, the last one with enabled core dumping (ulimits
set to unlimited)
A note on reproducibility: To recreate the crash, reading the contents
of the file prior to removal seems necessary. Simply calling stat on the
file and then performing the removal also yields an Input/output error
but does not crash the MDS.
Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and
only comes back once the files in question are accessed (stat call is
sufficient).
For now, I'll hold off on running first-damage.py to try to remove the
affected files / inodes. Ultimately however, this seems to be the most
sensible solution to me, at least with regards to cluster downtime.
Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx