On Fri, Oct 8, 2021 at 6:44 AM David Prude <david@xxxxxxxxxxxxxxxx> wrote: > > Hello, > > My apologies if this has been answered previously but by attempt to > find an answer have failed me. I am trying to determine the canonical > manner for determining how much storage space a cephfs snapshot is > consuming. It seems that you can determine the size of the referenced > data by pulling the ceph.dir.rbytes attribute for the the snap > directory, however there does not seem to be an attribute which > indicates the storage the snapshot it's self is consuming: > > getfattr -d -m - daily_2021-10-07_191702 > # file: daily_2021-10-07_191702 > ceph.dir.entries="17" > ceph.dir.files="0" > ceph.dir.rbytes="6129426031788" > ceph.dir.rctime="1633653849.686409000" > ceph.dir.rentries="132588" > ceph.dir.rfiles="97679" > ceph.dir.rsubdirs="34909" > ceph.dir.subdirs="17" Yeah. Because all the allocations are handled by OSDs, and the OSDs and the MDS don't communicate about individual objects, the per-snapshot size differential is not actually tracked. Doing so is infeasible — it's known only by the OSD and potentially changes on every write to the live data, which is far too much communication to make happen while keeping any of these systems functional. > > I have found in the documentation references to the command "ceph fs > subvolume snapshot info" which should be able to give snapshot size in > bytes for a snapshot of a subvolume, however we are not using > subvolumes. I am reasonably sure this doesn't do what you seem to want, either — I think it's just plugging in the rbytes value (much of the subvolume API exists so it can plug in to the OpenStack Manila interfaces). > If we assume a cephfs volume "volume" with a top-level > directory "directory" and an associated snapshot "snapshot": > > volume/directory/.snap/snapshot > > What is the best way to determine the size consumed by snapshot? If you really, REALLY need this, the only approach I can come up with is to traverse the snapshot and the live tree and identify changed files, and use some heuristic to guess about how much of the data is actually changed between them. But the basic problem is that data usage frequently doesn't belong to a snapshot, it belongs to a SET of snapshots, so even if we did the data gathering, we can't partition it out between them. If for instance your data flow looks like this: AAAA -- snapshot 1 BBBB -- snapshot 2 -- snapshot 3 -- snapshot 4 CCCC -- snapshot 5 Then you might say that snapshot 2 is size 4 and snapshots 3 and 4 are size 0. But if you delete snapshot 2, you can't actually remove BBBB, because it's required for snapshots 3 and 4. -Greg > > Thank you, > > -David > > > -- > David Prude > Systems Administrator > PGP Fingerprint: 1DAA 4418 7F7F B8AA F50C 6FDF C294 B58F A286 F847 > Democracy Now! > www.democracynow.org > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx