Hi all, it seems to be the time of stuck MDSs. We also have our ceph filesystem degraded. The MDS is stuck in replay for about 20 hours now. We run a nautilus ceph cluster with about 300TB of data and many millions of files. We run two MDSs with a particularly large directory pinned to one of them. Both MDSs have standby MDSs. We are in the process of migrating to a new pacific cluster and have been syncing files daily. Over the weekend something happened and we ended up with slow MDS responses and some directories became very slow (as we'd expect). We restarted the second MDS. It came back within a minute and the problem disappeared for a little while. The slow MDS operations came back and we restarted the other MDS. This one has been in replay state since yesterday. The cluster is healthy. So, we are wondering what it is up to. How long it might take. And is there something we can do to speed up the replay phase. Regards magnus The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx