Hi all, we have hit the problem where a directory tree containing over a million entries was deleted on a snapshotted cephfs. The cluster reports mostly healthy except for some slow MDS responses. However, the filesystem became unusable. The MDS reports ceph daemon mds.`hostname -s` perf dump | grep stray "num_strays": 211378, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 2489960, "strays_enqueued": 2344793, "strays_reintegrated": 64668, "strays_migrated": 2562, We have deleted a bunch of snapshots and the snaptrim has completed. Possibly we made matters worse by reducing the number of active MDS from 2 to 1. The 2nd MDS has been stopping since yesterday. I presume we could just wait and the problem will resolve itself eventually. However, is there a way to speed up the recovery process. The cephfs is currently online. Would it help to shut it down? Is there some setting that we could temporarily change to deal with the strays? Do we need to remove all snapshots? The cluster is running nautilus. I was aware of this problem but was assured that these large directories would not get deleted. I believe newer versions of cephfs have dealt with this issue. Is that correct? Suggestions are greatly appreciated. Cheers magnus The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx