Re: Speeding up garbage collection in RGW

David Turner <drakonstein@xxxxxxxxx> · Fri, 27 Oct 2017 15:38:48 +0000

I had the exact same error when using --bypass-gc.  We too decided to destroy this realm and start it fresh.  For us, 95% of the data in this realm is backups for other systems and they're find rebuilding it.  So our plan is to migrate the 5% of the data to a temporary s3 location and then rebuild this realm with brand-new pools, a fresh GC, and new settings. I can add this realm to the offerings of tests to figure out options.  It's running Jewel 10.2.7.

On Fri, Oct 27, 2017 at 11:26 AM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
On Wed, Oct 25, 2017 at 4:02 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:

>

> On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:

> > That helps a little bit, but overall the process would take years at this

> > rate:

> >

> > # for i in {1..3600}; do ceph df -f json-pretty |grep -A7 '".rgw.buckets"' |grep objects; sleep 60; done

> >                  "objects": 1660775838

> >                  "objects": 1660775733

> >                  "objects": 1660775548

> >                  "objects": 1660774825

> >                  "objects": 1660774790

> >                  "objects": 1660774735

> >

> > This is on a hammer cluster.  Would upgrading to Jewel or Luminous speed up

> > this process at all?

>

> I'm not sure it's going to help much, although the omap performance

> might improve there. The big problem is that the omaps are just too

> big, so that every operation on them take considerable time. I think

> the best way forward there is to take a list of all the rados objects

> that need to be removed from the gc omaps, and then get rid of the gc

> objects themselves (newer ones will be created, this time using the

> new configurable). Then remove the objects manually (and concurrently)

> using the rados command line tool.

> The one problem I see here is that even just removal of objects with

> large omaps can affect the availability of the osds that hold these

> objects. I discussed that now with Josh, and we think the best way to

> deal with that is not to remove the gc objects immediatly, but to

> rename the gc pool, and create a new one (with appropriate number of

> pgs). This way new gc entries will now go into the new gc pool (with

> higher number of gc shards), and you don't need to remove the old gc

> objects (thus no osd availability problem). Then you can start

> trimming the old gc objects (on the old renamed pool) by using the

> rados command. It'll take a very very long time, but the process

> should pick up speed slowly, as the objects shrink.

That's fine for us.  We'll be tearing down this cluster in a few weeks

and adding the nodes to the new cluster we created.  I just wanted to

explore other options now that we can use it as a test cluster.

The solution you described with renaming the .rgw.gc pool and creating a

new one is pretty interesting.  I'll have to give that a try, but until

then I've been trying to remove some of the other buckets with the

--bypass-gc option and it keeps dying with output like this:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc

2017-10-27 08:00:00.865993 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0

2017-10-27 08:00:04.385875 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0

2017-10-27 08:00:04.517241 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0

2017-10-27 08:00:05.791876 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0

2017-10-27 08:00:26.815081 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1090645 stripe_ofs=1090645 part_ofs=0 rule->part_size=0

2017-10-27 08:00:46.757556 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0

2017-10-27 08:00:47.093813 7f2b387228c0 -1 ERROR: could not drain handles as aio completion returned with -2

I can typically make further progress by running it again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc

2017-10-27 08:20:57.310859 7fae9c3d48c0  0 RGWObjManifest::operator++(): result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0

2017-10-27 08:20:57.406684 7fae9c3d48c0  0 RGWObjManifest::operator++(): result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0

2017-10-27 08:20:57.808050 7fae9c3d48c0 -1 ERROR: could not drain handles as aio completion returned with -2

and again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc

2017-10-27 08:22:04.992578 7ff8071038c0  0 RGWObjManifest::operator++(): result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0

2017-10-27 08:22:05.726485 7ff8071038c0 -1 ERROR: could not drain handles as aio completion returned with -2

What does this error mean, and is there any way to keep it from dying

like this?  This cluster is running 0.94.10, but I can upgrade it to jewel

pretty easily if you would like.

Thanks,

Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com