Re: rgw leaking data, orphan search loop

Marius Vaitiekunas <mariusvaitiekunas@xxxxxxxxx> · Thu, 22 Dec 2016 12:00:48 +0200

On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas <mariusvaitiekunas@xxxxxxxxx> wrote:
Hi,
1) I've written before into mailing list, but one more time. We have big issues recently with rgw on jewel. because of leaked data - the rate is about 50GB/hour.

We've hitted these bugs:
rgw: fix put_acls for objects starting and ending with underscore (issue#17625, pr#11669, Orit Wasserman)

Upgraded to jewel 10.2.5 - no luck.

Also we've hitted this one:
rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if somewhere in the cluster is still running Hammer (issue#17371, pr#11519, Orit Wasserman)

Fixed zonemaps - also no luck.

We do not use multisite - only default realm, zonegroup, zone.

We have no more ideas, how these data leak could happen. gc is working - we can see it in rgw logs.

Maybe, someone could give any hint about this? Where should we look?

2) Another story is about removing all the leaked/orphan objects.
radosgw-admin orphans find enters the loop state on stage when it starts linking objects.

We've tried to change the number of shards to 16, 64 (default), 512. At the moment it's running with shards number 1.

Again, any ideas how to make orphan search happen?

I could provide any logs, configs, etc. if someone is ready to help on this case.

Sorry. I forgot to mention, that we've registered two issues on tracker:
http://tracker.ceph.com/issues/18331
http://tracker.ceph.com/issues/18258

-- 
Marius Vaitiekūnas

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com