Re: [ext] CephFS pool not releasing space after data deletion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mathias/Frank,

(sorry for the late reply - this didn't get much attention including
the tracker report and eventually got parked).

Will have this looked into - expect an update in a day or two.

On Sat, Dec 2, 2023 at 5:46 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Mathias,
>
> have you made any progress on this? Did the capacity become available eventually?
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx>
> Sent: Friday, October 27, 2023 3:52 PM
> To: ceph-users@xxxxxxx; Frank Schilder
> Subject: Re: [ext]  CephFS pool not releasing space after data deletion
>
> Dear ceph users,
>
> We are wondering, if this might be the same issue as with this bug:
> https://tracker.ceph.com/issues/52581
>
> Except that we seem to have been snapshots dangling on the old pool.
> And the bug report snapshots dangling on the new pool.
> But maybe it's both?
>
> I mean, once the global root layout was created to a new pool,
> the new pool became in charge for snapshooting at least of new data, right?
> What about data which is overwritten? Is there a conflict of responsibility?
>
> We do have similar listings of snaps with "ceph osd pool ls detail", I
> think:
>
> 0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue
> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1
> object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32
> pgp_num_target 32 autoscale_mode on last_change 803558 lfor
> 0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0
> expected_num_objects 1 application cephfs
>          removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
> --
> pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3
> object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off
> last_change 803558 lfor 0/87229/87229 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application
> cephfs
>          removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
> --
> pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10
> min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192
> autoscale_mode off last_change 803558 lfor 0/0/681917 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768
> application cephfs
>          removed_snaps_queue
> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
>
>
> Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root
> layout.
> Pool hdd_ec is the one which was assigned before and which won't release
> space (at least where I know of).
>
> Is this removed_snaps_queue the same as removed_snaps in the bug issue
> (i.e. the label was renamed)?
> And is it normal that all queues list the same info or should this be
> different per pool?
> Might this be related to pools having now share responsibility over some
> snaps due to layout changes?
>
> And for the big question:
> How can I actually trigger/speedup the removal of those snaps?
> I find the removed_snaps/removed_snaps_queue mentioned a few times in
> the user list.
> But never with some conclusive answer how to deal with them.
> And the only mentions in the docs are just change logs.
>
> I also looked into and started cephfs stray scrubbing:
> https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursive-scrub
> But according to the status output, no scrubbing is actually active.
>
> I would appreciate any further ideas. Thanks a lot.
>
> Best Wishes,
> Mathias
>
> On 10/23/2023 12:42 PM, Kuhring, Mathias wrote:
> > Dear Ceph users,
> >
> > Our CephFS is not releasing/freeing up space after deleting hundreds of
> > terabytes of data.
> > By now, this drives us in a "nearfull" osd/pool situation and thus
> > throttles IO.
> >
> > We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
> > quincy (stable).
> >
> > Recently, we moved a bunch of data to a new pool with better EC.
> > This was done by adding a new EC pool to the FS.
> > Then assigning the FS root to the new EC pool via the directory layout xattr
> > (so all new data is written to the new pool).
> > And finally copying old data to new folders.
> >
> > I swapped the data as follows to remain the old directory structures.
> > I also made snapshots for validation purposes.
> >
> > So basically:
> > cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool
> > mkdir mymount/mydata/.snap/tovalidate
> > mkdir mymount/new/mydata/.snap/tovalidate
> > mv mymount/mydata/ mymount/old/
> > mv mymount/new/mydata mymount/
> >
> > I could see the increase of data in the new pool as expected (ceph df).
> > I compared the snapshots with hashdeep to make sure the new data is alright.
> >
> > Then I went ahead deleting the old data, basically:
> > rmdir mymount/old/mydata/.snap/* # this also included a bunch of other
> > older snapshots
> > rm -r mymount/old/mydata
> >
> > At first we had a bunch of PGs with snaptrim/snaptrim_wait.
> > But they are done for quite some time now.
> > And now, already two weeks later the size of the old pool still hasn't
> > really decreased.
> > I'm still waiting for around 500 TB to be released (and much more is
> > planned).
> >
> > I honestly have no clue, where to go from here.
> >   From my point of view (i.e. the CephFS mount), the data is gone.
> > I also never hard/soft-linked it anywhere.
> >
> > This doesn't seem to be a regular issue.
> > At least I couldn't find anything related or resolved in the docs or
> > user list, yet.
> > If anybody has an idea how to resolve this, I would highly appreciate it.
> >
> > Best Wishes,
> > Mathias
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Mathias Kuhring
>
> Dr. rer. nat.
> Bioinformatician
> HPC & Core Unit Bioinformatics
> Berlin Institute of Health at Charité (BIH)
>
> E-Mail: mathias.kuhring@xxxxxxxxxxxxxx
> Mobile: +49 172 3475576
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux