Re: Resharding issues / How long does it take?

Martin Emrich <martin.emrich@xxxxxxxxxxx> · Tue, 12 Dec 2017 13:56:07 +0100

Hi!

(By the way, now a second bucket has this problem, it apparently occurs 
when the automatic resharding commences while data is being written to 
the bucket).

Am 12.12.17 um 09:53 schrieb Orit Wasserman:
On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich
<martin.emrich@xxxxxxxxxxx> wrote:

This is after resharding the bucket?

Yes.

Which logs would be helpful?

rgw logs , if you can increase the debug level debug_rgw=20 and
debug_ms=1 that will be great.

As now a second bucket went down, I suspect I can reproduce it.
When I can, I'll collect log files.

When resharding completes it prints the old bucket instance id, you
will need to remove it.
I think this is the warning in the start of resharding I believe in
your case resharding hasn't completed.
you can get the old bucket instance id from the resharding log or the
bucket info.
Than you will need to delete it using rados command.

Indeed. But in the .data pool, there are objects from all buckets, and I 
have no idea how to identify the objects belonging to the faulty bucket.

My primary goal is now to completely remove the damaged bucket without a trace, but I'd also love to find out what went wrong in the first place.
Could I have a "multisite setup" without knowing it? I did not knowingly set up anything in this regard, It's just one cluster, with three identically configured radosgw behind a load balancer...

Probably not you need to setup multisite ...

Good to know ;)

As to remove the damaged bucket, if the bucket index is no consistent
you will need to manually remove it:
first unlink the bucket from the user: radosgw-admin bucket unlink

Than you will need manually remove the bucket:
1. if this is you only bucket I would go for deleting the bucket pools
and using new one
2. Try to fix the bucket using bucket check --fix
3. try this procedure
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020012.html
4. Another options is to remove the deleted objects entries from the
bucket index and try to delete it

As a second bucket has failed, I am now in the process of evacuating all 
buckets to another store, then I feel more confident of trying this.

Thanks,

Martin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com