Re: rgw leaking data, orphan search loop

Wido den Hollander <wido@xxxxxxxx> · Fri, 23 Dec 2016 16:05:31 +0100 (CET)

> Op 22 december 2016 om 19:00 schreef Orit Wasserman <owasserm@xxxxxxxxxx>:
> 
> 
> HI Maruis,
> 
> On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas
> <mariusvaitiekunas@xxxxxxxxx> wrote:
> > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas
> > <mariusvaitiekunas@xxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> 1) I've written before into mailing list, but one more time. We have big
> >> issues recently with rgw on jewel. because of leaked data - the rate is
> >> about 50GB/hour.
> >>
> >> We've hitted these bugs:
> >> rgw: fix put_acls for objects starting and ending with underscore
> >> (issue#17625, pr#11669, Orit Wasserman)
> >>
> >> Upgraded to jewel 10.2.5 - no luck.
> >>
> >> Also we've hitted this one:
> >> rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if
> >> somewhere in the cluster is still running Hammer (issue#17371, pr#11519,
> >> Orit Wasserman)
> >>
> >> Fixed zonemaps - also no luck.
> >>
> >> We do not use multisite - only default realm, zonegroup, zone.
> >>
> >> We have no more ideas, how these data leak could happen. gc is working -
> >> we can see it in rgw logs.
> >>
> >> Maybe, someone could give any hint about this? Where should we look?
> >>
> >>
> >> 2) Another story is about removing all the leaked/orphan objects.
> >> radosgw-admin orphans find enters the loop state on stage when it starts
> >> linking objects.
> >>
> >> We've tried to change the number of shards to 16, 64 (default), 512. At
> >> the moment it's running with shards number 1.
> >>
> >> Again, any ideas how to make orphan search happen?
> >>
> >>
> >> I could provide any logs, configs, etc. if someone is ready to help on
> >> this case.
> >>
> >>
> 
> How many buckets do you have ? how many object in each?
> Can you provide the output of rados ls -p .rgw.buckets ?

Marius asked me to look into this for him, so I did.

What I found is that at *least* three buckets have way more RADOS objects then they should.

The .rgw.buckets pool has 35.651.590 objects totaling 76880G.

I listed all objects in the .rgw.buckets pool and summed them per bucket, the top 5:

 783844 default.25918901.102486
 876013 default.25918901.3
3325825 default.24201682.7
6324217 default.84795862.29891
7805208 default.25933378.233873

So I started to rados_stat() (using Python) all the objects in the last three pools. While these stat() calls are still running. I statted about 30% of the objects and their total size is already 17511GB/17TB.

size_kb_actual summed up for bucket default.24201682.7, default.84795862.29891 and default.25933378.233873 sums up to 12TB.

So I'm currently at 30% of statting the objects and I'm already 5TB over the total size of these buckets.

What I noticed is that it's mainly *shadow* objects which are all 4MB in size.

I know that 'radosgw-admin orphans find --pool=.rgw.buckets --job-id=xyz' should also do this for me, but as mentioned, this keeps looping and hangs.

So for now I'll probably resort to figuring out which RADOS objects are obsolete by matching against the bucket's index, but that's a lot of manual work.

I'd rather fix the orphans find, so I will probably run that with high logging enabled so we can have some interesting information.

In the meantime, any hints or suggestions?

The cluster is running v10.2.5 btw.

> 
> Orit
> 
> >
> > Sorry. I forgot to mention, that we've registered two issues on tracker:
> > http://tracker.ceph.com/issues/18331
> > http://tracker.ceph.com/issues/18258
> >
> > --
> > Marius Vaitiekūnas
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com