Hi Mathias, have you made any progress on this? Did the capacity become available eventually? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx> Sent: Friday, October 27, 2023 3:52 PM To: ceph-users@xxxxxxx; Frank Schilder Subject: Re: [ext] CephFS pool not releasing space after data deletion Dear ceph users, We are wondering, if this might be the same issue as with this bug: https://tracker.ceph.com/issues/52581 Except that we seem to have been snapshots dangling on the old pool. And the bug report snapshots dangling on the new pool. But maybe it's both? I mean, once the global root layout was created to a new pool, the new pool became in charge for snapshooting at least of new data, right? What about data which is overwritten? Is there a conflict of responsibility? We do have similar listings of snaps with "ceph osd pool ls detail", I think: 0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 803558 lfor 0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0 expected_num_objects 1 application cephfs removed_snaps_queue [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] -- pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off last_change 803558 lfor 0/87229/87229 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application cephfs removed_snaps_queue [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] -- pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10 min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode off last_change 803558 lfor 0/0/681917 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768 application cephfs removed_snaps_queue [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root layout. Pool hdd_ec is the one which was assigned before and which won't release space (at least where I know of). Is this removed_snaps_queue the same as removed_snaps in the bug issue (i.e. the label was renamed)? And is it normal that all queues list the same info or should this be different per pool? Might this be related to pools having now share responsibility over some snaps due to layout changes? And for the big question: How can I actually trigger/speedup the removal of those snaps? I find the removed_snaps/removed_snaps_queue mentioned a few times in the user list. But never with some conclusive answer how to deal with them. And the only mentions in the docs are just change logs. I also looked into and started cephfs stray scrubbing: https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursive-scrub But according to the status output, no scrubbing is actually active. I would appreciate any further ideas. Thanks a lot. Best Wishes, Mathias On 10/23/2023 12:42 PM, Kuhring, Mathias wrote: > Dear Ceph users, > > Our CephFS is not releasing/freeing up space after deleting hundreds of > terabytes of data. > By now, this drives us in a "nearfull" osd/pool situation and thus > throttles IO. > > We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) > quincy (stable). > > Recently, we moved a bunch of data to a new pool with better EC. > This was done by adding a new EC pool to the FS. > Then assigning the FS root to the new EC pool via the directory layout xattr > (so all new data is written to the new pool). > And finally copying old data to new folders. > > I swapped the data as follows to remain the old directory structures. > I also made snapshots for validation purposes. > > So basically: > cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool > mkdir mymount/mydata/.snap/tovalidate > mkdir mymount/new/mydata/.snap/tovalidate > mv mymount/mydata/ mymount/old/ > mv mymount/new/mydata mymount/ > > I could see the increase of data in the new pool as expected (ceph df). > I compared the snapshots with hashdeep to make sure the new data is alright. > > Then I went ahead deleting the old data, basically: > rmdir mymount/old/mydata/.snap/* # this also included a bunch of other > older snapshots > rm -r mymount/old/mydata > > At first we had a bunch of PGs with snaptrim/snaptrim_wait. > But they are done for quite some time now. > And now, already two weeks later the size of the old pool still hasn't > really decreased. > I'm still waiting for around 500 TB to be released (and much more is > planned). > > I honestly have no clue, where to go from here. > From my point of view (i.e. the CephFS mount), the data is gone. > I also never hard/soft-linked it anywhere. > > This doesn't seem to be a regular issue. > At least I couldn't find anything related or resolved in the docs or > user list, yet. > If anybody has an idea how to resolve this, I would highly appreciate it. > > Best Wishes, > Mathias > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@xxxxxxxxxxxxxx Mobile: +49 172 3475576 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx