Re: Troubleshooting rgw bucket list

Sam Wouters <sam@xxxxxxxxx> · Tue, 1 Sep 2015 10:16:59 +0200

Hi, I've started the bucket --check --fix on friday evening and it's
still running. 'ceph -s' shows the cluster health as OK, I don't know if
there is anything else I could check? Is there a way of finding out if
its actually doing something?

We only have this issue on the one bucket with versioning enabled, I
can't get rid of the feeling it has something todo with that. The
"underscore bug" is also still present on that bucket
(http://tracker.ceph.com/issues/12819). Not sure if thats related in any
way.
Are there any alternatives, as for example copy all the objects into a
new bucket without versioning? Simple way would be to list the objects
and copy them to a new bucket, but bucket listing is not working so...

-Sam

On 31-08-15 10:47, Gregory Farnum wrote:
> This generally shouldn't be a problem at your bucket sizes. Have you
> checked that the cluster is actually in a healthy state? The sleeping
> locks are normal but should be getting woken up; if they aren't it
> means the object access isn't working for some reason. A down PG or
> something would be the simplest explanation.
> -Greg
>
> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <sam@xxxxxxxxx> wrote:
>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>> or progress logging of the radosgw-admin tool.
>> I will start a check and let it run over the weekend.
>>
>> tnx,
>> Sam
>>
>> On 28-08-15 18:16, Sam Wouters wrote:
>>> Hi,
>>>
>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>> objects (but no versioning configured), which has no issues.
>>>
>>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be
>>> happening. Putting an strace on the process shows a lot of lines like these:
>>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
>>> <unfinished ...>
>>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>> [pid 99385] <... futex resumed> )       = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> [pid 99371] <... futex resumed> )       = 0
>>>
>>> but no errors in the ceph logs or health warnings.
>>>
>>> r,
>>> Sam
>>>
>>> On 28-08-15 17:49, Ben Hines wrote:
>>>> How many objects in the bucket?
>>>>
>>>> RGW has problems with index size once number of objects gets into the
>>>> 900000+ level. The buckets need to be recreated with 'sharded bucket
>>>> indexes' on:
>>>>
>>>> rgw override bucket index max shards = 23
>>>>
>>>> You could also try repairing the index with:
>>>>
>>>>  radosgw-admin bucket check --fix --bucket=<bucketname>
>>>>
>>>> -Ben
>>>>
>>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <sam@xxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> we have a rgw bucket (with versioning) where PUT and GET operations for
>>>>> specific objects succeed,  but retrieving an object list fails.
>>>>> Using python-boto, after a timeout just gives us an 500 internal error;
>>>>> radosgw-admin just hangs.
>>>>> Also a radosgw-admin bucket check just seems to hang...
>>>>>
>>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>>>>> quietly hoped upgrading would fix but it didn't...
>>>>>
>>>>> r,
>>>>> Sam
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com