Adding some few extra details for clarity:

We've attempted to reset the snap table during the first graceful recovery using:
- cephfs-table-tool filesystem:0 reset session
There has only been 1 active MDS for this filesystem.

> We've sadly attempted to restore the root, which made the entire filesystem tree stray.
- cephfs-data-scan --filesystem filesystem init
This has been a stupid thing to try in retrospective.

The initial crash has been caused due to increased load during scrubbing which brought the MDS to a halt. Issuing ceph mds fail caused the initial journal corruption which was mostly atime modifications and some non-critical data. Following a scrub and fix, everything seemed fine until the next time the MDS was switched, where the previous crash kept happening.

Alex D.
RedXen System & Infrastructure Administration

