Re: Speeding up garbage collection in RGW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I had the exact same error when using --bypass-gc.  We too decided to destroy this realm and start it fresh.  For us, 95% of the data in this realm is backups for other systems and they're find rebuilding it.  So our plan is to migrate the 5% of the data to a temporary s3 location and then rebuild this realm with brand-new pools, a fresh GC, and new settings. I can add this realm to the offerings of tests to figure out options.  It's running Jewel 10.2.7.

On Fri, Oct 27, 2017 at 11:26 AM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
On Wed, Oct 25, 2017 at 4:02 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
>
> On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
> > That helps a little bit, but overall the process would take years at this
> > rate:
> >
> > # for i in {1..3600}; do ceph df -f json-pretty |grep -A7 '".rgw.buckets"' |grep objects; sleep 60; done
> >                  "objects": 1660775838
> >                  "objects": 1660775733
> >                  "objects": 1660775548
> >                  "objects": 1660774825
> >                  "objects": 1660774790
> >                  "objects": 1660774735
> >
> > This is on a hammer cluster.  Would upgrading to Jewel or Luminous speed up
> > this process at all?
>
> I'm not sure it's going to help much, although the omap performance
> might improve there. The big problem is that the omaps are just too
> big, so that every operation on them take considerable time. I think
> the best way forward there is to take a list of all the rados objects
> that need to be removed from the gc omaps, and then get rid of the gc
> objects themselves (newer ones will be created, this time using the
> new configurable). Then remove the objects manually (and concurrently)
> using the rados command line tool.
> The one problem I see here is that even just removal of objects with
> large omaps can affect the availability of the osds that hold these
> objects. I discussed that now with Josh, and we think the best way to
> deal with that is not to remove the gc objects immediatly, but to
> rename the gc pool, and create a new one (with appropriate number of
> pgs). This way new gc entries will now go into the new gc pool (with
> higher number of gc shards), and you don't need to remove the old gc
> objects (thus no osd availability problem). Then you can start
> trimming the old gc objects (on the old renamed pool) by using the
> rados command. It'll take a very very long time, but the process
> should pick up speed slowly, as the objects shrink.

That's fine for us.  We'll be tearing down this cluster in a few weeks
and adding the nodes to the new cluster we created.  I just wanted to
explore other options now that we can use it as a test cluster.

The solution you described with renaming the .rgw.gc pool and creating a
new one is pretty interesting.  I'll have to give that a try, but until
then I've been trying to remove some of the other buckets with the
--bypass-gc option and it keeps dying with output like this:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:00:00.865993 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
2017-10-27 08:00:04.385875 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
2017-10-27 08:00:04.517241 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
2017-10-27 08:00:05.791876 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
2017-10-27 08:00:26.815081 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1090645 stripe_ofs=1090645 part_ofs=0 rule->part_size=0
2017-10-27 08:00:46.757556 7f2b387228c0  0 RGWObjManifest::operator++(): result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
2017-10-27 08:00:47.093813 7f2b387228c0 -1 ERROR: could not drain handles as aio completion returned with -2


I can typically make further progress by running it again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:20:57.310859 7fae9c3d48c0  0 RGWObjManifest::operator++(): result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
2017-10-27 08:20:57.406684 7fae9c3d48c0  0 RGWObjManifest::operator++(): result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
2017-10-27 08:20:57.808050 7fae9c3d48c0 -1 ERROR: could not drain handles as aio completion returned with -2


and again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:22:04.992578 7ff8071038c0  0 RGWObjManifest::operator++(): result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
2017-10-27 08:22:05.726485 7ff8071038c0 -1 ERROR: could not drain handles as aio completion returned with -2


What does this error mean, and is there any way to keep it from dying
like this?  This cluster is running 0.94.10, but I can upgrade it to jewel
pretty easily if you would like.

Thanks,
Bryan


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux