This did the trick! THANK YOU!
After starting with the mds_wipe_sessions set and after removing the mds*_openfiles.0 entries in the metadata pool, mds started almost immediately and went to active. I verified that the filesystem could mount again, shut down mds, removed the wipe sessions setting, and restarted all four mds daemons. The cluster is back to healthy again.
I've got more stuff to write up on our end for recovery procedures now, and that's a good thing! Thanks again!
jonathan
On Wed, Aug 15, 2018 at 11:12 PM, Jonathan Woytek <woytek@xxxxxxxxxxx> wrote:
--On Wed, Aug 15, 2018 at 11:02 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:On Thu, Aug 16, 2018 at 10:55 AM Jonathan Woytek <woytek@xxxxxxxxxxx> wrote:
>
> ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed 98143071c4) mimic (stable)
>
>
Try deleting mds0_openfiles.0 (mds1_openfiles.0 and so on if you have
multiple active mds) from metadata pool of your filesystem. Records
in these files are open files hints. It's safe to delete them.I will try that in the morning. I had to bail for the night here (UTC-4). Thank you!JonathanSent from my Commodore64
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com