Re: Troubleshooting rgw bucket list

Sam Wouters <sam@xxxxxxxxx> · Tue, 1 Sep 2015 19:25:34 +0200

It looks like it, this is what shows in the logs after bumping the debug
and requesting a bucket list.

2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.008675 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.009136 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.009146 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.009153 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.009158 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009163 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.009165 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009189 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435876
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.009629 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.009638 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.009645 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.009651 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009655 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.009657 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009681 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435878
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.010139 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.010149 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.010156 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.010161 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.010166 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.010168 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.010192 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435880
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870

On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote:
> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
> operations (bucket listing and bucket check) go into some kind of
> infinite loop?
>
> Yehuda
>
> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <sam@xxxxxxxxx> wrote:
>> Hi, I've started the bucket --check --fix on friday evening and it's
>> still running. 'ceph -s' shows the cluster health as OK, I don't know if
>> there is anything else I could check? Is there a way of finding out if
>> its actually doing something?
>>
>> We only have this issue on the one bucket with versioning enabled, I
>> can't get rid of the feeling it has something todo with that. The
>> "underscore bug" is also still present on that bucket
>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
>> way.
>> Are there any alternatives, as for example copy all the objects into a
>> new bucket without versioning? Simple way would be to list the objects
>> and copy them to a new bucket, but bucket listing is not working so...
>>
>> -Sam
>>
>>
>> On 31-08-15 10:47, Gregory Farnum wrote:
>>> This generally shouldn't be a problem at your bucket sizes. Have you
>>> checked that the cluster is actually in a healthy state? The sleeping
>>> locks are normal but should be getting woken up; if they aren't it
>>> means the object access isn't working for some reason. A down PG or
>>> something would be the simplest explanation.
>>> -Greg
>>>
>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <sam@xxxxxxxxx> wrote:
>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>>> or progress logging of the radosgw-admin tool.
>>>> I will start a check and let it run over the weekend.
>>>>
>>>> tnx,
>>>> Sam
>>>>
>>>> On 28-08-15 18:16, Sam Wouters wrote:
>>>>> Hi,
>>>>>
>>>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>>>> objects (but no versioning configured), which has no issues.
>>>>>
>>>>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be
>>>>> happening. Putting an strace on the process shows a lot of lines like these:
>>>>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
>>>>> <unfinished ...>
>>>>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>>>>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 99385] <... futex resumed> )       = -1 EAGAIN (Resource
>>>>> temporarily unavailable)
>>>>> [pid 99371] <... futex resumed> )       = 0
>>>>>
>>>>> but no errors in the ceph logs or health warnings.
>>>>>
>>>>> r,
>>>>> Sam
>>>>>
>>>>> On 28-08-15 17:49, Ben Hines wrote:
>>>>>> How many objects in the bucket?
>>>>>>
>>>>>> RGW has problems with index size once number of objects gets into the
>>>>>> 900000+ level. The buckets need to be recreated with 'sharded bucket
>>>>>> indexes' on:
>>>>>>
>>>>>> rgw override bucket index max shards = 23
>>>>>>
>>>>>> You could also try repairing the index with:
>>>>>>
>>>>>>  radosgw-admin bucket check --fix --bucket=<bucketname>
>>>>>>
>>>>>> -Ben
>>>>>>
>>>>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <sam@xxxxxxxxx> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> we have a rgw bucket (with versioning) where PUT and GET operations for
>>>>>>> specific objects succeed,  but retrieving an object list fails.
>>>>>>> Using python-boto, after a timeout just gives us an 500 internal error;
>>>>>>> radosgw-admin just hangs.
>>>>>>> Also a radosgw-admin bucket check just seems to hang...
>>>>>>>
>>>>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>>>>>>> quietly hoped upgrading would fix but it didn't...
>>>>>>>
>>>>>>> r,
>>>>>>> Sam
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com