On Tue, May 31, 2022 at 3:42 AM Magnus HAGDORN <Magnus.Hagdorn@xxxxxxxx> wrote: > > Hi all, > it seems to be the time of stuck MDSs. We also have our ceph filesystem > degraded. The MDS is stuck in replay for about 20 hours now. > > We run a nautilus ceph cluster with about 300TB of data and many > millions of files. We run two MDSs with a particularly large directory > pinned to one of them. Both MDSs have standby MDSs. > > We are in the process of migrating to a new pacific cluster and have > been syncing files daily. Over the weekend something happened and we > ended up with slow MDS responses and some directories became very slow > (as we'd expect). We restarted the second MDS. It came back within a > minute and the problem disappeared for a little while. The slow MDS > operations came back and we restarted the other MDS. This one has been > in replay state since yesterday. > Can you temporarily turn up the MDS debug log level (debug_mds) to check what's happening to this MDS during replay? ceph config set mds debug_mds 10 Is the health of the MDS host okay? Is it low on memory? > The cluster is healthy. > Can you share the output of the `ceph status` , `ceph fs status` and `ceph --version`? > So, we are wondering what it is up to. How long it might take. And is > there something we can do to speed up the replay phase. > > Regards > magnus > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx Regards, Ramana _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx