Hi Mathias/Frank, (sorry for the late reply - this didn't get much attention including the tracker report and eventually got parked). Will have this looked into - expect an update in a day or two. On Sat, Dec 2, 2023 at 5:46 PM Frank Schilder <frans@xxxxxx> wrote: > > Hi Mathias, > > have you made any progress on this? Did the capacity become available eventually? > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx> > Sent: Friday, October 27, 2023 3:52 PM > To: ceph-users@xxxxxxx; Frank Schilder > Subject: Re: [ext] CephFS pool not releasing space after data deletion > > Dear ceph users, > > We are wondering, if this might be the same issue as with this bug: > https://tracker.ceph.com/issues/52581 > > Except that we seem to have been snapshots dangling on the old pool. > And the bug report snapshots dangling on the new pool. > But maybe it's both? > > I mean, once the global root layout was created to a new pool, > the new pool became in charge for snapshooting at least of new data, right? > What about data which is overwritten? Is there a conflict of responsibility? > > We do have similar listings of snaps with "ceph osd pool ls detail", I > think: > > 0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue > pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1 > object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32 > pgp_num_target 32 autoscale_mode on last_change 803558 lfor > 0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0 > expected_num_objects 1 application cephfs > removed_snaps_queue > [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] > -- > pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3 > object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off > last_change 803558 lfor 0/87229/87229 flags > hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application > cephfs > removed_snaps_queue > [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] > -- > pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10 > min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192 > autoscale_mode off last_change 803558 lfor 0/0/681917 flags > hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768 > application cephfs > removed_snaps_queue > [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1] > > > Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root > layout. > Pool hdd_ec is the one which was assigned before and which won't release > space (at least where I know of). > > Is this removed_snaps_queue the same as removed_snaps in the bug issue > (i.e. the label was renamed)? > And is it normal that all queues list the same info or should this be > different per pool? > Might this be related to pools having now share responsibility over some > snaps due to layout changes? > > And for the big question: > How can I actually trigger/speedup the removal of those snaps? > I find the removed_snaps/removed_snaps_queue mentioned a few times in > the user list. > But never with some conclusive answer how to deal with them. > And the only mentions in the docs are just change logs. > > I also looked into and started cephfs stray scrubbing: > https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursive-scrub > But according to the status output, no scrubbing is actually active. > > I would appreciate any further ideas. Thanks a lot. > > Best Wishes, > Mathias > > On 10/23/2023 12:42 PM, Kuhring, Mathias wrote: > > Dear Ceph users, > > > > Our CephFS is not releasing/freeing up space after deleting hundreds of > > terabytes of data. > > By now, this drives us in a "nearfull" osd/pool situation and thus > > throttles IO. > > > > We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) > > quincy (stable). > > > > Recently, we moved a bunch of data to a new pool with better EC. > > This was done by adding a new EC pool to the FS. > > Then assigning the FS root to the new EC pool via the directory layout xattr > > (so all new data is written to the new pool). > > And finally copying old data to new folders. > > > > I swapped the data as follows to remain the old directory structures. > > I also made snapshots for validation purposes. > > > > So basically: > > cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool > > mkdir mymount/mydata/.snap/tovalidate > > mkdir mymount/new/mydata/.snap/tovalidate > > mv mymount/mydata/ mymount/old/ > > mv mymount/new/mydata mymount/ > > > > I could see the increase of data in the new pool as expected (ceph df). > > I compared the snapshots with hashdeep to make sure the new data is alright. > > > > Then I went ahead deleting the old data, basically: > > rmdir mymount/old/mydata/.snap/* # this also included a bunch of other > > older snapshots > > rm -r mymount/old/mydata > > > > At first we had a bunch of PGs with snaptrim/snaptrim_wait. > > But they are done for quite some time now. > > And now, already two weeks later the size of the old pool still hasn't > > really decreased. > > I'm still waiting for around 500 TB to be released (and much more is > > planned). > > > > I honestly have no clue, where to go from here. > > From my point of view (i.e. the CephFS mount), the data is gone. > > I also never hard/soft-linked it anywhere. > > > > This doesn't seem to be a regular issue. > > At least I couldn't find anything related or resolved in the docs or > > user list, yet. > > If anybody has an idea how to resolve this, I would highly appreciate it. > > > > Best Wishes, > > Mathias > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > Mathias Kuhring > > Dr. rer. nat. > Bioinformatician > HPC & Core Unit Bioinformatics > Berlin Institute of Health at Charité (BIH) > > E-Mail: mathias.kuhring@xxxxxxxxxxxxxx > Mobile: +49 172 3475576 > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx