Hi all, we seem to have recovered from our cephfs misadventure. Having said that I would like to better understand what went wrong and if/how we can avoid that in future. We have nautilus ceph cluster that provides cephfs to our school. We keep nightly snapshots for one week. One user has a particularly deep directory structure with lots and lots of tiny files (100s of millions). This user deleted a directory with ~2 million entries in a deep directory structure. We noticed something went wrong when we got 'No space left on device' messages when try to delete a file. Eventually we figured out that we were exceeding the number of stray files and once we set the mds_bal_fragment_size_max to 300000 we got our service back (after lots of prodding that probably didn't help). So my understanding is that deleted files end up in a special directory when snapshots are used. This directory is limited to 10^6 files by default. I guess my main question is - does this problem still occur in more recent versions of ceph than nautilus? If the problem does still occur, can we mitigate it somehow? (we are wondering whether we should store the millions of tiny files differently). Are snapshots the problem? Currently we snapshot at the top level. Does it make sense to have multiple snapshots further down the tree? Cheers magnus The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx