Re: rgw leaking data, orphan search loop

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Fri, 24 Feb 2017 10:22:56 -0800

oid is object id. The orphan find command generates a list of objects
that needs to be removed at the end of the run (if finishes
successfully). If you didn't catch that, you should be able to still
run the same scan (using the same scan id) and retrieve that info
again.

Yehuda

On Fri, Feb 24, 2017 at 9:48 AM, George Mihaiescu <lmihaiescu@xxxxxxxxx> wrote:
> Hi Yehuda,
>
> Thank you for the quick reply.
>
> What is the <oid> you're referring to that I should backup and then delete?
> I extracted the files from the ".log" pool where the "orphan find" tool
> stored the results, but they are zero bytes files.
>
>
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.52
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.58
> -rw-r--r-- 1 root root 0 Feb 24 12:45 obj_delete_at_hint.0000000122
> -rw-r--r-- 1 root root 0 Feb 24 12:45 obj_delete_at_hint.0000000057
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.bck1.rados.53
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.buckets.20
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.buckets.25
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.bck1.rados.0
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.2
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.linked.19
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.38
> -rw-r--r-- 1 root root 0 Feb 24 12:45 obj_delete_at_hint.0000000018
> -rw-r--r-- 1 root root 0 Feb 24 12:45 obj_delete_at_hint.0000000092
> -rw-r--r-- 1 root root 0 Feb 24 12:45 obj_delete_at_hint.0000000108
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.bck1.rados.13
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.linked.20
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.18
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.bck1.rados.11
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.rados.50
> -rw-r--r-- 1 root root 0 Feb 24 12:45 orphan.scan.orphans.buckets.33
>
>
> George
>
>
>
> On Fri, Feb 24, 2017 at 12:12 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> we wanted to have more confidence in the orphans search tool before
>> providing a functionality that actually remove the objects. One thing
>> that you can do is create a new pool, copy these objects to the new
>> pool (as a backup, rados -p <source-pool> --target-pool=<target-pool>
>> cp <oid> <oid>), and remove these objects (rados -p <pool> rm <oid>).
>> Then when you're confident enough that this didn't break existing
>> objects, you can remove the backup pool.
>>
>> Yehuda
>>
>> On Fri, Feb 24, 2017 at 8:23 AM, George Mihaiescu <lmihaiescu@xxxxxxxxx>
>> wrote:
>> > Hi,
>> >
>> > I updated http://tracker.ceph.com/issues/18331 with my own issue, and I
>> > am
>> > hoping Orit or Yehuda could give their opinion on what to do next.
>> > What was the purpose of the "orphan find" tool and how to actually clean
>> > up
>> > these files?
>> >
>> > Thank you,
>> > George
>> >
>> >
>> > On Fri, Jan 13, 2017 at 2:22 PM, Wido den Hollander <wido@xxxxxxxx>
>> > wrote:
>> >>
>> >>
>> >> > Op 24 december 2016 om 13:47 schreef Wido den Hollander
>> >> > <wido@xxxxxxxx>:
>> >> >
>> >> >
>> >> >
>> >> > > Op 23 december 2016 om 16:05 schreef Wido den Hollander
>> >> > > <wido@xxxxxxxx>:
>> >> > >
>> >> > >
>> >> > >
>> >> > > > Op 22 december 2016 om 19:00 schreef Orit Wasserman
>> >> > > > <owasserm@xxxxxxxxxx>:
>> >> > > >
>> >> > > >
>> >> > > > HI Maruis,
>> >> > > >
>> >> > > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas
>> >> > > > <mariusvaitiekunas@xxxxxxxxx> wrote:
>> >> > > > > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas
>> >> > > > > <mariusvaitiekunas@xxxxxxxxx> wrote:
>> >> > > > >>
>> >> > > > >> Hi,
>> >> > > > >>
>> >> > > > >> 1) I've written before into mailing list, but one more time.
>> >> > > > >> We
>> >> > > > >> have big
>> >> > > > >> issues recently with rgw on jewel. because of leaked data -
>> >> > > > >> the
>> >> > > > >> rate is
>> >> > > > >> about 50GB/hour.
>> >> > > > >>
>> >> > > > >> We've hitted these bugs:
>> >> > > > >> rgw: fix put_acls for objects starting and ending with
>> >> > > > >> underscore
>> >> > > > >> (issue#17625, pr#11669, Orit Wasserman)
>> >> > > > >>
>> >> > > > >> Upgraded to jewel 10.2.5 - no luck.
>> >> > > > >>
>> >> > > > >> Also we've hitted this one:
>> >> > > > >> rgw: RGW loses realm/period/zonegroup/zone data: period
>> >> > > > >> overwritten if
>> >> > > > >> somewhere in the cluster is still running Hammer (issue#17371,
>> >> > > > >> pr#11519,
>> >> > > > >> Orit Wasserman)
>> >> > > > >>
>> >> > > > >> Fixed zonemaps - also no luck.
>> >> > > > >>
>> >> > > > >> We do not use multisite - only default realm, zonegroup, zone.
>> >> > > > >>
>> >> > > > >> We have no more ideas, how these data leak could happen. gc is
>> >> > > > >> working -
>> >> > > > >> we can see it in rgw logs.
>> >> > > > >>
>> >> > > > >> Maybe, someone could give any hint about this? Where should we
>> >> > > > >> look?
>> >> > > > >>
>> >> > > > >>
>> >> > > > >> 2) Another story is about removing all the leaked/orphan
>> >> > > > >> objects.
>> >> > > > >> radosgw-admin orphans find enters the loop state on stage when
>> >> > > > >> it
>> >> > > > >> starts
>> >> > > > >> linking objects.
>> >> > > > >>
>> >> > > > >> We've tried to change the number of shards to 16, 64
>> >> > > > >> (default),
>> >> > > > >> 512. At
>> >> > > > >> the moment it's running with shards number 1.
>> >> > > > >>
>> >> > > > >> Again, any ideas how to make orphan search happen?
>> >> > > > >>
>> >> > > > >>
>> >> > > > >> I could provide any logs, configs, etc. if someone is ready to
>> >> > > > >> help on
>> >> > > > >> this case.
>> >> > > > >>
>> >> > > > >>
>> >> > > >
>> >> > > > How many buckets do you have ? how many object in each?
>> >> > > > Can you provide the output of rados ls -p .rgw.buckets ?
>> >> > >
>> >> > > Marius asked me to look into this for him, so I did.
>> >> > >
>> >> > > What I found is that at *least* three buckets have way more RADOS
>> >> > > objects then they should.
>> >> > >
>> >> > > The .rgw.buckets pool has 35.651.590 objects totaling 76880G.
>> >> > >
>> >> > > I listed all objects in the .rgw.buckets pool and summed them per
>> >> > > bucket, the top 5:
>> >> > >
>> >> > >  783844 default.25918901.102486
>> >> > >  876013 default.25918901.3
>> >> > > 3325825 default.24201682.7
>> >> > > 6324217 default.84795862.29891
>> >> > > 7805208 default.25933378.233873
>> >> > >
>> >> > > So I started to rados_stat() (using Python) all the objects in the
>> >> > > last three pools. While these stat() calls are still running. I
>> >> > > statted
>> >> > > about 30% of the objects and their total size is already
>> >> > > 17511GB/17TB.
>> >> > >
>> >> > > size_kb_actual summed up for bucket default.24201682.7,
>> >> > > default.84795862.29891 and default.25933378.233873 sums up to 12TB.
>> >> > >
>> >> > > So I'm currently at 30% of statting the objects and I'm already 5TB
>> >> > > over the total size of these buckets.
>> >> > >
>> >> >
>> >> > The stat calls have finished. The grant total is 65TB.
>> >> >
>> >> > So while the buckets should consume only 12TB they seems to occupy
>> >> > 65TB
>> >> > of storage.
>> >> >
>> >> > > What I noticed is that it's mainly *shadow* objects which are all
>> >> > > 4MB
>> >> > > in size.
>> >> > >
>> >> > > I know that 'radosgw-admin orphans find --pool=.rgw.buckets
>> >> > > --job-id=xyz' should also do this for me, but as mentioned, this
>> >> > > keeps
>> >> > > looping and hangs.
>> >> > >
>> >> >
>> >> > I started this tool about 20 hours ago:
>> >> >
>> >> > # radosgw-admin orphans find --pool=.rgw.buckets --job-id=wido1
>> >> > --debug-rados=10 2>&1|gzip > orphans.find.wido1.log.gz
>> >> >
>> >> > It now shows me this in the logs while it is still running:
>> >> >
>> >> > 2016-12-24 13:41:00.989876 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.27 nspace=
>> >> > 2016-12-24 13:41:00.993271 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> > storing 2 entries at orphan.scan.wido1.linked.28
>> >> > 2016-12-24 13:41:00.993311 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.28 nspace=
>> >> > storing 1 entries at orphan.scan.wido1.linked.31
>> >> > 2016-12-24 13:41:00.995698 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> > 2016-12-24 13:41:00.995787 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.31 nspace=
>> >> > storing 1 entries at orphan.scan.wido1.linked.33
>> >> > 2016-12-24 13:41:00.997730 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> > 2016-12-24 13:41:00.997776 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.33 nspace=
>> >> > 2016-12-24 13:41:01.000161 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> > storing 1 entries at orphan.scan.wido1.linked.35
>> >> > 2016-12-24 13:41:01.000225 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.35 nspace=
>> >> > 2016-12-24 13:41:01.002102 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> > storing 1 entries at orphan.scan.wido1.linked.36
>> >> > 2016-12-24 13:41:01.002167 7ff6844d29c0 10 librados: omap-set-vals
>> >> > oid=orphan.scan.wido1.linked.36 nspace=
>> >> > storing 1 entries at orphan.scan.wido1.linked.39
>> >> > 2016-12-24 13:41:01.004397 7ff6844d29c0 10 librados: Objecter
>> >> > returned
>> >> > from omap-set-vals r=0
>> >> >
>> >> > It seems to still be doing something, is that correct?
>> >> >
>> >>
>> >> Giving this thread a gentle bump.
>> >>
>> >> There is a issue in the tracker for this:
>> >> http://tracker.ceph.com/issues/18331
>> >>
>> >> In addition there is the issue that the orphan search stays in a
>> >> endless
>> >> loop: http://tracker.ceph.com/issues/18258
>> >>
>> >> This has been discussed multiple times on the ML but I never saw it
>> >> getting resolved.
>> >>
>> >> Any ideas?
>> >>
>> >> Wido
>> >>
>> >> > Wido
>> >> >
>> >> > > So for now I'll probably resort to figuring out which RADOS objects
>> >> > > are obsolete by matching against the bucket's index, but that's a
>> >> > > lot of
>> >> > > manual work.
>> >> > >
>> >> > > I'd rather fix the orphans find, so I will probably run that with
>> >> > > high
>> >> > > logging enabled so we can have some interesting information.
>> >> > >
>> >> > > In the meantime, any hints or suggestions?
>> >> > >
>> >> > > The cluster is running v10.2.5 btw.
>> >> > >
>> >> > > >
>> >> > > > Orit
>> >> > > >
>> >> > > > >
>> >> > > > > Sorry. I forgot to mention, that we've registered two issues on
>> >> > > > > tracker:
>> >> > > > > http://tracker.ceph.com/issues/18331
>> >> > > > > http://tracker.ceph.com/issues/18258
>> >> > > > >
>> >> > > > > --
>> >> > > > > Marius Vaitiekūnas
>> >> > > > >
>> >> > > > > _______________________________________________
>> >> > > > > ceph-users mailing list
>> >> > > > > ceph-users@xxxxxxxxxxxxxx
>> >> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > >
>> >> > > > _______________________________________________
>> >> > > > ceph-users mailing list
>> >> > > > ceph-users@xxxxxxxxxxxxxx
>> >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@xxxxxxxxxxxxxx
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com