Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Patrick,

On 27.11.23 19:05, Patrick Donnelly wrote:

I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:

ceph config set mds debug_mds 20
ceph config set mds debug_ms 1

[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/


Uploaded debug log and core dump, see ceph-post-file: 02f78445-7136-44c9-a362-410de37a0b7d Unfortunately, we cannot easily shut down normal access to the cluster for these tests, therefore there is quite some clutter in the logs. The logs show three crashes, the last one with enabled core dumping (ulimits set to unlimited)

A note on reproducibility: To recreate the crash, reading the contents of the file prior to removal seems necessary. Simply calling stat on the file and then performing the removal also yields an Input/output error but does not crash the MDS.

Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and only comes back once the files in question are accessed (stat call is sufficient).


For now, I'll hold off on running first-damage.py to try to remove the affected files / inodes. Ultimately however, this seems to be the most sensible solution to me, at least with regards to cluster downtime.

Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux