On Sat, Dec 24, 2016 at 2:47 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> Op 23 december 2016 om 16:05 schreef Wido den Hollander <wido@xxxxxxxx>:
The stat calls have finished. The grant total is 65TB.>
>
>
> > Op 22 december 2016 om 19:00 schreef Orit Wasserman <owasserm@xxxxxxxxxx>:
> >
> >
> > HI Maruis,
> >
> > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas
> > <mariusvaitiekunas@xxxxxxxxx> wrote:
> > > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas
> > > <mariusvaitiekunas@xxxxxxxxx> wrote:
> > >>
> > >> Hi,
> > >>
> > >> 1) I've written before into mailing list, but one more time. We have big
> > >> issues recently with rgw on jewel. because of leaked data - the rate is
> > >> about 50GB/hour.
> > >>
> > >> We've hitted these bugs:
> > >> rgw: fix put_acls for objects starting and ending with underscore
> > >> (issue#17625, pr#11669, Orit Wasserman)
> > >>
> > >> Upgraded to jewel 10.2.5 - no luck.
> > >>
> > >> Also we've hitted this one:
> > >> rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if
> > >> somewhere in the cluster is still running Hammer (issue#17371, pr#11519,
> > >> Orit Wasserman)
> > >>
> > >> Fixed zonemaps - also no luck.
> > >>
> > >> We do not use multisite - only default realm, zonegroup, zone.
> > >>
> > >> We have no more ideas, how these data leak could happen. gc is working -
> > >> we can see it in rgw logs.
> > >>
> > >> Maybe, someone could give any hint about this? Where should we look?
> > >>
> > >>
> > >> 2) Another story is about removing all the leaked/orphan objects.
> > >> radosgw-admin orphans find enters the loop state on stage when it starts
> > >> linking objects.
> > >>
> > >> We've tried to change the number of shards to 16, 64 (default), 512. At
> > >> the moment it's running with shards number 1.
> > >>
> > >> Again, any ideas how to make orphan search happen?
> > >>
> > >>
> > >> I could provide any logs, configs, etc. if someone is ready to help on
> > >> this case.
> > >>
> > >>
> >
> > How many buckets do you have ? how many object in each?
> > Can you provide the output of rados ls -p .rgw.buckets ?
>
> Marius asked me to look into this for him, so I did.
>
> What I found is that at *least* three buckets have way more RADOS objects then they should.
>
> The .rgw.buckets pool has 35.651.590 objects totaling 76880G.
>
> I listed all objects in the .rgw.buckets pool and summed them per bucket, the top 5:
>
> 783844 default.25918901.102486
> 876013 default.25918901.3
> 3325825 default.24201682.7
> 6324217 default.84795862.29891
> 7805208 default.25933378.233873
>
> So I started to rados_stat() (using Python) all the objects in the last three pools. While these stat() calls are still running. I statted about 30% of the objects and their total size is already 17511GB/17TB.
>
> size_kb_actual summed up for bucket default.24201682.7, default.84795862.29891 and default.25933378.233873 sums up to 12TB.
>
> So I'm currently at 30% of statting the objects and I'm already 5TB over the total size of these buckets.
>
So while the buckets should consume only 12TB they seems to occupy 65TB of storage.
All these leaking buckets have on thing in common - hadoop S3A client (https://wiki.apache.org/hadoop/AmazonS3) is used. And some of the objects have long names with many underscores For example:
dt=20160814-060014-911/_temporary/0/_temporary/attempt_201608140600_0001_m_000003_339/part-00003.gz
dt=20160814-083014-948/_temporary/0/_temporary/attempt_201608140830_0001_m_000006_294/part-00006.gz
> What I noticed is that it's mainly *shadow* objects which are all 4MB in size.
>
> I know that 'radosgw-admin orphans find --pool=.rgw.buckets --job-id=xyz' should also do this for me, but as mentioned, this keeps looping and hangs.
>
I started this tool about 20 hours ago:
# radosgw-admin orphans find --pool=.rgw.buckets --job-id=wido1 --debug-rados=10 2>&1|gzip > orphans.find.wido1.log.gz
It now shows me this in the logs while it is still running:
2016-12-24 13:41:00.989876 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.27 nspace=
2016-12-24 13:41:00.993271 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
storing 2 entries at orphan.scan.wido1.linked.28
2016-12-24 13:41:00.993311 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.28 nspace=
storing 1 entries at orphan.scan.wido1.linked.31
2016-12-24 13:41:00.995698 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
2016-12-24 13:41:00.995787 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.31 nspace=
storing 1 entries at orphan.scan.wido1.linked.33
2016-12-24 13:41:00.997730 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
2016-12-24 13:41:00.997776 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.33 nspace=
2016-12-24 13:41:01.000161 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
storing 1 entries at orphan.scan.wido1.linked.35
2016-12-24 13:41:01.000225 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.35 nspace=
2016-12-24 13:41:01.002102 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
storing 1 entries at orphan.scan.wido1.linked.36
2016-12-24 13:41:01.002167 7ff6844d29c0 10 librados: omap-set-vals oid=orphan.scan.wido1.linked.36 nspace=
storing 1 entries at orphan.scan.wido1.linked.39
2016-12-24 13:41:01.004397 7ff6844d29c0 10 librados: Objecter returned from omap-set-vals r=0
It seems to still be doing something, is that correct?
Wido
> So for now I'll probably resort to figuring out which RADOS objects are obsolete by matching against the bucket's index, but that's a lot of manual work.
>
> I'd rather fix the orphans find, so I will probably run that with high logging enabled so we can have some interesting information.
>
> In the meantime, any hints or suggestions?
>
> The cluster is running v10.2.5 btw.
>
> >
> > Orit
> >
> > >
> > > Sorry. I forgot to mention, that we've registered two issues on tracker:
> > > http://tracker.ceph.com/issues/18331
> > > http://tracker.ceph.com/issues/18258
> > >
> > > --
> > > Marius Vaitiekūnas
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
Marius Vaitiekūnas
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com