Re: RGW Replication

Yehuda Sadeh <yehuda@xxxxxxxxxxx> · Wed, 5 Feb 2014 14:37:05 -0800

On Wed, Feb 5, 2014 at 2:21 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:
> On 02/05/2014 01:23 PM, Craig Lewis wrote:
>>
>>
>> On 2/4/14 20:02 , Josh Durgin wrote:
>>>
>>>
>>> From the log it looks like you're hitting the default maximum number of
>>> entries to be processed at once per shard. This was intended to prevent
>>> one really busy shard from blocking progress on syncing other shards,
>>> since the remainder will be synced the next time the shard is processed.
>>> Perhaps the default is too low though, or the idea should be scrapped
>>> altogether since you can sync other shards in parallel.
>>>
>>> For your particular usage, since you're updating the same few buckets,
>>> the max entries limit is hit constantly. You can increase it with
>>> max-entries: 1000000 in the config file or --max-entries 10000000 on
>>> the command line.
>>>
>>> Josh
>>
>>
>> This doesn't appear to have any effect:
>> root@ceph1c:/etc/init.d# grep max-entries /etc/ceph/radosgw-agent.conf
>> max-entries: 1000000
>> root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after'
>> /var/log/ceph/radosgw-agent.log  | tail -1
>> 2014-02-05T13:11:03.915 2743:INFO:radosgw_agent.worker:bucket instance
>> "live-2:us-west-1.35026898.2" has 1000 entries after
>> "00000206789.410535.3"
>>
>> Neither does --max-entries 10000000:
>> root@ceph1c:/etc/init.d# ps auxww | grep radosgw-agent | grep max-entries
>> root     19710  6.0  0.0  74492 18708 pts/3    S    13:22 0:00
>> /usr/bin/python /usr/bin/radosgw-agent --incremental-sync-delay=10
>> --max-entries 10000000 -c /etc/ceph/radosgw-agent.conf
>> root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after'
>> /var/log/ceph/radosgw-agent.us-west-1.us-central-1.log  | tail -1
>> 2014-02-05T13:22:58.577 21626:INFO:radosgw_agent.worker:bucket instance
>> "live-2:us-west-1.35026898.2" has 1000 entries after
>> "00000207788.411542.2"
>>
>>
>> I guess I'll look into that too, since I'll be in that area of the code.
>
>
> It seems to be a hardcoded limit on the server side to prevent a single
> osd operation from taking too long (see cls_log_list() in ceph.git
> src/cls/cls_log.cc).

Right, that part is intentional. Otherwise osd operation might take too long.

>
> This should probably be fixed in radosgw, but you could work around it
> with a loop in radosgw-agent.
>

No need to change the radosgw side, it's intentional. The agent should
assume requests are paged and behave accordingly.

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com