Re: Resharding issues / How long does it take?

Orit Wasserman <owasserm@xxxxxxxxxx> · Tue, 12 Dec 2017 10:53:53 +0200

Hi,

On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich
<martin.emrich@xxxxxxxxxxx> wrote:
> Hi!
>
> Am 10.12.17, 11:54 schrieb "Orit Wasserman" <owasserm@xxxxxxxxxx>:
>
>     Hi Martin,
>
>     On Thu, Dec 7, 2017 at 5:05 PM, Martin Emrich <martin.emrich@xxxxxxxxxxx> wrote:
>
>     It could be issue: http://tracker.ceph.com/issues/21619
>     The workaround is running radosgw-admin bucket check --fix , it will
>     reset the resharding flag.
>     If you can update the tracker with your logs it will be very helpful.
>
> I already tried that two times, and after working hard for a few minutes it hangs. Each time it seems to have created a new "set" of objects in the pool (the bucket had ca. 110000 objects, "radosgw-admin bucket limit check" reports ca. 330000 "num_objects" now).
>

This is after resharding the bucket?

> Which logs would be helpful?
>

rgw logs , if you can increase the debug level debug_rgw=20 and
debug_ms=1 that will be great.

>     > I have a feeling that the bucket index is still damaged/incomplete/inconsistent. What does the message
>     >
>     > *** NOTICE: operation will not remove old bucket index objects ***
>     > ***         these will need to be removed manually             ***
>     >
>     > mean? How can I clean up manually?
>     >
>
>     Resharding creates a new bucket index with the new number of shards.
>     It doesn't remove the old bucket index, you will need to do it manually.
>
> How do I do that? Does it just involve identifying the right objects in the RADOS pools to delete? Or is there more to it?
>

When resharding completes it prints the old bucket instance id, you
will need to remove it.
I think this is the warning in the start of resharding I believe in
your case resharding hasn't completed.
you can get the old bucket instance id from the resharding log or the
bucket info.
Than you will need to delete it using rados command.

> My primary goal is now to completely remove the damaged bucket without a trace, but I'd also love to find out what went wrong in the first place.
> Could I have a "multisite setup" without knowing it? I did not knowingly set up anything in this regard, It's just one cluster, with three identically configured radosgw behind a load balancer...
>
Probably not you need to setup multisite ...

As to remove the damaged bucket, if the bucket index is no consistent
you will need to manually remove it:
first unlink the bucket from the user: radosgw-admin bucket unlink

Than you will need manually remove the bucket:
1. if this is you only bucket I would go for deleting the bucket pools
and using new one
2. Try to fix the bucket using bucket check --fix
3. try this procedure
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020012.html
4. Another options is to remove the deleted objects entries from the
bucket index and try to delete it

Good Luck,
Orit

> Thanks,
>
> Martin
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com