I came up with a new theory for how to delete a large pool sanely and without impacting the cluster heavily. I haven't tested this yet, but it just occurred to me as I was planning to remove a large pool of my own, again.
First you need to stop all IO to the pool to be deleted. Next you stop an OSD; if the OSD is filestore you delete the PG folders or use the ceph-objectstore-tool to do it if it's bluestore. Start the OSD and move onto the next one (or do a full host at a time, just some sane method to go through all of your OSDs). Before long (probably fairly immediately) the cluster is freaking out about inconsistent PGs and lost data... PERFECT, we're deleting a pool, we want lost data. As long as no traffic is going to the pool, you shouldn't see any blocked requests in the cluster due to this. When you're done manually deleting the PGs for the pool from the OSDs, then you mark all of the PGs lost to the cluster and delete the now empty pool that happens instantly.
I intend to test this out in our staging environment and I'll update here. I expect to have to do some things at the end to get the pool to delete properly, possibly forcibly recreate the PGs or something. All in all, though, I think this should work nicely... if not tediously. Does anyone see any gotcha's that I haven't thought about here? I know my biggest question is why Ceph doesn't do something similar under the hood when deleting a pool. It took almost a month the last time I deleted a large pool.
On Fri, May 25, 2018 at 7:04 AM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
Also, upgrade to luminous and migrate your OSDs to bluestore before using erasure coding.Luminous + Bluestore performs so much better for erasure coding than any of the old configurations.Also, I've found that deleting a large number of objects is far less stressfull on a Bluestore OSD than on a Filestore OSD.Paul2018-05-22 19:28 GMT+02:00 David Turner <drakonstein@xxxxxxxxx>:From my experience, that would cause you some troubles as it would throw the entire pool into the deletion queue to be processed as it cleans up the disks and everything. I would suggest using a pool listing from `rados -p .rgw.buckets ls` and iterate on that using some scripts around the `rados -p .rgw.buckest rm <obj-name>` command that you could stop, restart at a faster pace, slow down, etc. Once the objects in the pool are gone, you can delete the empty pool without any problems. I like this option because it makes it simple to stop it if you're impacting your VM traffic.On Tue, May 22, 2018 at 11:05 AM Simon Ironside <sironside@xxxxxxxxxxxxx> wrote:Hi Everyone,
I have an older cluster (Hammer 0.94.7) with a broken radosgw service
that I'd just like to blow away before upgrading to Jewel after which
I'll start again with EC pools.
I don't need the data but I'm worried that deleting the .rgw.buckets
pool will cause performance degradation for the production RBD pool used
by VMs. .rgw.buckets is a replicated pool (size=3) with ~14TB data in
5.3M objects. A little over half the data in the whole cluster.
Is deleting this pool simply using ceph osd pool delete likely to cause
me a performance problem? If so, is there a way I can do it better?
Thanks,
Simon.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com