Hi Sebastian, On Wed, Nov 29, 2023 at 3:11 PM Sebastian Knust <sknust@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hello Patrick, > > On 27.11.23 19:05, Patrick Donnelly wrote: > > > > I would **really** love to see the debug logs from the MDS. Please > > upload them using ceph-post-file [1]. If you can reliably reproduce, > > turn on more debugging: > > > >> ceph config set mds debug_mds 20 > >> ceph config set mds debug_ms 1 > > > > [1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/ > > > > Uploaded debug log and core dump, see ceph-post-file: > 02f78445-7136-44c9-a362-410de37a0b7d > Unfortunately, we cannot easily shut down normal access to the cluster > for these tests, therefore there is quite some clutter in the logs. The > logs show three crashes, the last one with enabled core dumping (ulimits > set to unlimited) > > A note on reproducibility: To recreate the crash, reading the contents > of the file prior to removal seems necessary. Simply calling stat on the > file and then performing the removal also yields an Input/output error > but does not crash the MDS. > > Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and > only comes back once the files in question are accessed (stat call is > sufficient). I've not yet fully reviewed the logs but it seems there is a bug in the detection logic which causes a spurious abort. This does not appear to be actually new damage. Are you using postgres? If you can share details about your snapshot workflow and general workloads that would be helpful (privately if desired). > For now, I'll hold off on running first-damage.py to try to remove the > affected files / inodes. Ultimately however, this seems to be the most > sensible solution to me, at least with regards to cluster downtime. Please give me another day to review then feel free to use first-damage.py to cleanup. If you see new damage please upload the logs. -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx