Hi all, Just a bit of an outage with CephFS around the MDS's, I managed to get everything up and running again after a bit of head scratching and thought I would share here what happened. Cause I believe the MDS's which were running as VM's suffered when the hypervisor ran out of ram and started swapping due to hypervisor maintenance. I know this is less than ideal and have put steps in place to prevent this happening again. Symptoms 1. Noticed that both MDS's were down, log files on both showed that they had crashed 2. After restarting MDS's, their status kept flipping between replay and reconnect 3. Now again both MDS's would crash again 4. Log files showed they seemed to keep restarting after trying to reconnect clients 5. Clients were all kernel one was 3.19 and the rest 4.8. I believe the problematic client was one of the ones running Kernel 4.8 6. Ceph is 10.2.2 Resolution After some serious head scratching and a little bit of panicking, the fact the log files showed the restart always happened after trying to reconnect the clients gave me the idea to try and kill the sessions on the MDS. I first reset all the clients and waited, but this didn't seem to have any effect and I could still see the MDS trying to reconnect to the clients. I then decided to try and kill the sessions from the MDS end, so I shutdown the standby MDS (as they kept flipping active roles) and ran ceph daemon mds.gp-ceph-mds1 session ls I then tried to kill the last session in the list ceph daemon mds.gp-ceph-mds1 session evict <session id> I had to keep hammering this command to get it at the right point, as the MDS was only responding for a fraction of a second. Suddenly in my other window, where I had the tail of the MDS log, I saw a whizz of new information and then stopping with the MDS success message. So it seems something the MDS was trying to do whilst reconnecting was upsetting it. Ceph -s updated so show MDS was now active. Rebooting other MDS then corrected made it standby as well. Problem solved. I have uploaded the 2 MDS logs here if any CephFS dev's are interested in taking a closer look. http://app.sys-pro.co.uk/mds_logs.zip Nick _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com