On Tue, Feb 8, 2022 at 7:30 AM Dan van der Ster <dvanders@xxxxxxxxx> wrote: > > On Tue, Feb 8, 2022 at 1:04 PM Frank Schilder <frans@xxxxxx> wrote: > > The reason for this seemingly strange behaviour was an old static snapshot taken in an entirely different directory. Apparently, ceph fs snapshots are not local to an FS directory sub-tree but always global on the entire FS despite the fact that you can only access the sub-tree in the snapshot, which easily leads to the wrong conclusion that only data below the directory is in the snapshot. As a consequence, the static snapshot was accumulating the garbage from the rotating snapshots even though these sub-trees were completely disjoint. > > So are you saying that if I do this I'll have 1M files in stray? No, happily. The thing that's happening here post-dates my main previous stretch on CephFS and I had forgotten it, but there's a note in the developer docs: https://docs.ceph.com/en/latest/dev/cephfs-snapshots/#hard-links (I fortuitously stumbled across this from an entirely different direction/discussion just after seeing this thread and put the pieces together!) Basically, hard links are *the worst*. For everything in filesystems. I spent a lot of time trying to figure out how to handle hard links being renamed across snapshots[1] and never managed it, and the eventual "solution" was to give up and do the degenerate thing: If there's a file with multiple hard links, that file is a member of *every* snapshot. Doing anything about this will take a lot of time. There's probably an opportunity to improve it for users of the subvolumes library, as those subvolumes do get tagged a bit, so I'll see if we can look into that. But for generic CephFS, I'm not sure what the solution will look like at all. Sorry folks. :/ -Greg [1]: The issue is that, if you have a hard linked file in two places, you would expect it to be snapshotted whenever a snapshot covering either location occurs. But in CephFS the file can only live in one location, and the other location has to just hold a reference to it instead. So say you have inode Y at path A, and then hard link it in at path B. Given how snapshots work, when you open up Y from A, you would need to check all the snapshots that apply from both A and B's trees. But 1) opening up other paths is a challenge all on its own, and 2) without an inode and its backtrace to provide a lookup resolve point, it's impossible to maintain a lookup that scales and is possible to keep consistent. (Oh, I did just have one idea, but I'm not sure if it would fix every issue or just that scalable backtrace lookup: https://tracker.ceph.com/issues/54205) > > mkdir /a > cd /a > for i in {1..1000000}; do touch $i; done # create 1M files in /a > cd .. > mkdir /b > mkdir /b/.snap/testsnap # create a snap in the empty dir /b > rm -rf /a/ > > > Cheers, Dan > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx