Re: Troubleshooting rgw bucket list

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 31 Aug 2015 09:47:12 +0100



This generally shouldn't be a problem at your bucket sizes. Have you
checked that the cluster is actually in a healthy state? The sleeping
locks are normal but should be getting woken up; if they aren't it
means the object access isn't working for some reason. A down PG or
something would be the simplest explanation.
-Greg

On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <sam@xxxxxxxxx> wrote:
> Ok, maybe I'm to impatient. It would be great if there were some verbose
> or progress logging of the radosgw-admin tool.
> I will start a check and let it run over the weekend.
>
> tnx,
> Sam
>
> On 28-08-15 18:16, Sam Wouters wrote:
>> Hi,
>>
>> this bucket only has 13389 objects, so the index size shouldn't be a
>> problem. Also, on the same cluster we have an other bucket with 1200543
>> objects (but no versioning configured), which has no issues.
>>
>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be
>> happening. Putting an strace on the process shows a lot of lines like these:
>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
>> <unfinished ...>
>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> [pid 99385] <... futex resumed> )       = -1 EAGAIN (Resource
>> temporarily unavailable)
>> [pid 99371] <... futex resumed> )       = 0
>>
>> but no errors in the ceph logs or health warnings.
>>
>> r,
>> Sam
>>
>> On 28-08-15 17:49, Ben Hines wrote:
>>> How many objects in the bucket?
>>>
>>> RGW has problems with index size once number of objects gets into the
>>> 900000+ level. The buckets need to be recreated with 'sharded bucket
>>> indexes' on:
>>>
>>> rgw override bucket index max shards = 23
>>>
>>> You could also try repairing the index with:
>>>
>>>  radosgw-admin bucket check --fix --bucket=<bucketname>
>>>
>>> -Ben
>>>
>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <sam@xxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> we have a rgw bucket (with versioning) where PUT and GET operations for
>>>> specific objects succeed,  but retrieving an object list fails.
>>>> Using python-boto, after a timeout just gives us an 500 internal error;
>>>> radosgw-admin just hangs.
>>>> Also a radosgw-admin bucket check just seems to hang...
>>>>
>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>>>> quietly hoped upgrading would fix but it didn't...
>>>>
>>>> r,
>>>> Sam
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com