On Wed, Feb 5, 2014 at 2:21 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:On 02/05/2014 01:23 PM, Craig Lewis wrote:On 2/4/14 20:02 , Josh Durgin wrote:>From the log it looks like you're hitting the default maximum number of entries to be processed at once per shard. This was intended to prevent one really busy shard from blocking progress on syncing other shards, since the remainder will be synced the next time the shard is processed. Perhaps the default is too low though, or the idea should be scrapped altogether since you can sync other shards in parallel. For your particular usage, since you're updating the same few buckets, the max entries limit is hit constantly. You can increase it with max-entries: 1000000 in the config file or --max-entries 10000000 on the command line. JoshThis doesn't appear to have any effect: root@ceph1c:/etc/init.d# grep max-entries /etc/ceph/radosgw-agent.conf max-entries: 1000000 root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after' /var/log/ceph/radosgw-agent.log | tail -1 2014-02-05T13:11:03.915 2743:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000206789.410535.3" Neither does --max-entries 10000000: root@ceph1c:/etc/init.d# ps auxww | grep radosgw-agent | grep max-entries root 19710 6.0 0.0 74492 18708 pts/3 S 13:22 0:00 /usr/bin/python /usr/bin/radosgw-agent --incremental-sync-delay=10 --max-entries 10000000 -c /etc/ceph/radosgw-agent.conf root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after' /var/log/ceph/radosgw-agent.us-west-1.us-central-1.log | tail -1 2014-02-05T13:22:58.577 21626:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000207788.411542.2" I guess I'll look into that too, since I'll be in that area of the code.It seems to be a hardcoded limit on the server side to prevent a single osd operation from taking too long (see cls_log_list() in ceph.git src/cls/cls_log.cc).Right, that part is intentional. Otherwise osd operation might take too long.This should probably be fixed in radosgw, but you could work around it with a loop in radosgw-agent.No need to change the radosgw side, it's intentional. The agent should assume requests are paged and behave accordingly. Yehuda For the record, I can't lower the value either: root@ceph1c:/etc/init.d# ps auxww | grep radosgw-agent | grep max-entries root 16151 1.6 0.0 222384 18980 pts/3 Sl 15:22 0:01 /usr/bin/python /usr/bin/radosgw-agent --incremental-sync-delay=10 --max-entries 888 -c /etc/ceph/radosgw.replicate.us-west-1-to-us-central-1.conf root 17417 0.9 0.0 225056 19928 pts/3 Sl 15:22 0:00 /usr/bin/python /usr/bin/radosgw-agent --incremental-sync-delay=10 --max-entries 888 -c /etc/ceph/radosgw.replicate.us-west-1-to-us-central-1.conf root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after' /var/log/ceph/radosgw-agent.us-west-1.us-central-1.log | tail -1 2014-02-05T15:23:00.802 17417:INFO:radosgw_agent.worker:bucket instance "live-2:us-west-1.35026898.2" has 1000 entries after "00000217778.421759.2" I'll take a stab at it. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com