Re: rgw leaking data, orphan search loop

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Fri, 24 Feb 2017 09:12:58 -0800

Hi,

we wanted to have more confidence in the orphans search tool before
providing a functionality that actually remove the objects. One thing
that you can do is create a new pool, copy these objects to the new
pool (as a backup, rados -p <source-pool> --target-pool=<target-pool>
cp <oid> <oid>), and remove these objects (rados -p <pool> rm <oid>).
Then when you're confident enough that this didn't break existing
objects, you can remove the backup pool.

Yehuda

On Fri, Feb 24, 2017 at 8:23 AM, George Mihaiescu <lmihaiescu@xxxxxxxxx> wrote:
> Hi,
>
> I updated http://tracker.ceph.com/issues/18331 with my own issue, and I am
> hoping Orit or Yehuda could give their opinion on what to do next.
> What was the purpose of the "orphan find" tool and how to actually clean up
> these files?
>
> Thank you,
> George
>
>
> On Fri, Jan 13, 2017 at 2:22 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>
>>
>> > Op 24 december 2016 om 13:47 schreef Wido den Hollander <wido@xxxxxxxx>:
>> >
>> >
>> >
>> > > Op 23 december 2016 om 16:05 schreef Wido den Hollander
>> > > <wido@xxxxxxxx>:
>> > >
>> > >
>> > >
>> > > > Op 22 december 2016 om 19:00 schreef Orit Wasserman
>> > > > <owasserm@xxxxxxxxxx>:
>> > > >
>> > > >
>> > > > HI Maruis,
>> > > >
>> > > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas
>> > > > <mariusvaitiekunas@xxxxxxxxx> wrote:
>> > > > > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas
>> > > > > <mariusvaitiekunas@xxxxxxxxx> wrote:
>> > > > >>
>> > > > >> Hi,
>> > > > >>
>> > > > >> 1) I've written before into mailing list, but one more time. We
>> > > > >> have big
>> > > > >> issues recently with rgw on jewel. because of leaked data - the
>> > > > >> rate is
>> > > > >> about 50GB/hour.
>> > > > >>
>> > > > >> We've hitted these bugs:
>> > > > >> rgw: fix put_acls for objects starting and ending with underscore
>> > > > >> (issue#17625, pr#11669, Orit Wasserman)
>> > > > >>
>> > > > >> Upgraded to jewel 10.2.5 - no luck.
>> > > > >>
>> > > > >> Also we've hitted this one:
>> > > > >> rgw: RGW loses realm/period/zonegroup/zone data: period
>> > > > >> overwritten if
>> > > > >> somewhere in the cluster is still running Hammer (issue#17371,
>> > > > >> pr#11519,
>> > > > >> Orit Wasserman)
>> > > > >>
>> > > > >> Fixed zonemaps - also no luck.
>> > > > >>
>> > > > >> We do not use multisite - only default realm, zonegroup, zone.
>> > > > >>
>> > > > >> We have no more ideas, how these data leak could happen. gc is
>> > > > >> working -
>> > > > >> we can see it in rgw logs.
>> > > > >>
>> > > > >> Maybe, someone could give any hint about this? Where should we
>> > > > >> look?
>> > > > >>
>> > > > >>
>> > > > >> 2) Another story is about removing all the leaked/orphan objects.
>> > > > >> radosgw-admin orphans find enters the loop state on stage when it
>> > > > >> starts
>> > > > >> linking objects.
>> > > > >>
>> > > > >> We've tried to change the number of shards to 16, 64 (default),
>> > > > >> 512. At
>> > > > >> the moment it's running with shards number 1.
>> > > > >>
>> > > > >> Again, any ideas how to make orphan search happen?
>> > > > >>
>> > > > >>
>> > > > >> I could provide any logs, configs, etc. if someone is ready to
>> > > > >> help on
>> > > > >> this case.
>> > > > >>
>> > > > >>
>> > > >
>> > > > How many buckets do you have ? how many object in each?
>> > > > Can you provide the output of rados ls -p .rgw.buckets ?
>> > >
>> > > Marius asked me to look into this for him, so I did.
>> > >
>> > > What I found is that at *least* three buckets have way more RADOS
>> > > objects then they should.
>> > >
>> > > The .rgw.buckets pool has 35.651.590 objects totaling 76880G.
>> > >
>> > > I listed all objects in the .rgw.buckets pool and summed them per
>> > > bucket, the top 5:
>> > >
>> > >  783844 default.25918901.102486
>> > >  876013 default.25918901.3
>> > > 3325825 default.24201682.7
>> > > 6324217 default.84795862.29891
>> > > 7805208 default.25933378.233873
>> > >
>> > > So I started to rados_stat() (using Python) all the objects in the
>> > > last three pools. While these stat() calls are still running. I statted
>> > > about 30% of the objects and their total size is already 17511GB/17TB.
>> > >
>> > > size_kb_actual summed up for bucket default.24201682.7,
>> > > default.84795862.29891 and default.25933378.233873 sums up to 12TB.
>> > >
>> > > So I'm currently at 30% of statting the objects and I'm already 5TB
>> > > over the total size of these buckets.
>> > >
>> >
>> > The stat calls have finished. The grant total is 65TB.
>> >
>> > So while the buckets should consume only 12TB they seems to occupy 65TB
>> > of storage.
>> >
>> > > What I noticed is that it's mainly *shadow* objects which are all 4MB
>> > > in size.
>> > >
>> > > I know that 'radosgw-admin orphans find --pool=.rgw.buckets
>> > > --job-id=xyz' should also do this for me, but as mentioned, this keeps
>> > > looping and hangs.
>> > >
>> >
>> > I started this tool about 20 hours ago:
>> >
>> > # radosgw-admin orphans find --pool=.rgw.buckets --job-id=wido1
>> > --debug-rados=10 2>&1|gzip > orphans.find.wido1.log.gz
>> >
>> > It now shows me this in the logs while it is still running:
>> >
>> > 2016-12-24 13:41:00.989876 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.27 nspace=
>> > 2016-12-24 13:41:00.993271 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> > storing 2 entries at orphan.scan.wido1.linked.28
>> > 2016-12-24 13:41:00.993311 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.28 nspace=
>> > storing 1 entries at orphan.scan.wido1.linked.31
>> > 2016-12-24 13:41:00.995698 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> > 2016-12-24 13:41:00.995787 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.31 nspace=
>> > storing 1 entries at orphan.scan.wido1.linked.33
>> > 2016-12-24 13:41:00.997730 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> > 2016-12-24 13:41:00.997776 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.33 nspace=
>> > 2016-12-24 13:41:01.000161 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> > storing 1 entries at orphan.scan.wido1.linked.35
>> > 2016-12-24 13:41:01.000225 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.35 nspace=
>> > 2016-12-24 13:41:01.002102 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> > storing 1 entries at orphan.scan.wido1.linked.36
>> > 2016-12-24 13:41:01.002167 7ff6844d29c0 10 librados: omap-set-vals
>> > oid=orphan.scan.wido1.linked.36 nspace=
>> > storing 1 entries at orphan.scan.wido1.linked.39
>> > 2016-12-24 13:41:01.004397 7ff6844d29c0 10 librados: Objecter returned
>> > from omap-set-vals r=0
>> >
>> > It seems to still be doing something, is that correct?
>> >
>>
>> Giving this thread a gentle bump.
>>
>> There is a issue in the tracker for this:
>> http://tracker.ceph.com/issues/18331
>>
>> In addition there is the issue that the orphan search stays in a endless
>> loop: http://tracker.ceph.com/issues/18258
>>
>> This has been discussed multiple times on the ML but I never saw it
>> getting resolved.
>>
>> Any ideas?
>>
>> Wido
>>
>> > Wido
>> >
>> > > So for now I'll probably resort to figuring out which RADOS objects
>> > > are obsolete by matching against the bucket's index, but that's a lot of
>> > > manual work.
>> > >
>> > > I'd rather fix the orphans find, so I will probably run that with high
>> > > logging enabled so we can have some interesting information.
>> > >
>> > > In the meantime, any hints or suggestions?
>> > >
>> > > The cluster is running v10.2.5 btw.
>> > >
>> > > >
>> > > > Orit
>> > > >
>> > > > >
>> > > > > Sorry. I forgot to mention, that we've registered two issues on
>> > > > > tracker:
>> > > > > http://tracker.ceph.com/issues/18331
>> > > > > http://tracker.ceph.com/issues/18258
>> > > > >
>> > > > > --
>> > > > > Marius Vaitiekūnas
>> > > > >
>> > > > > _______________________________________________
>> > > > > ceph-users mailing list
>> > > > > ceph-users@xxxxxxxxxxxxxx
>> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > >
>> > > > _______________________________________________
>> > > > ceph-users mailing list
>> > > > ceph-users@xxxxxxxxxxxxxx
>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com