Re: RGW Replication

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Wed, 05 Feb 2014 15:48:40 -0800

              Craig Lewis

               Senior Systems Engineer

                Office +1.714.602.1309

                Email clewis@xxxxxxxxxxxxxxxxxx

              Central Desktop.
                  Work together in ways you never thought possible.

                   Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog  

      On 2/5/14 14:37 , Yehuda Sadeh wrote:

      On Wed, Feb 5, 2014 at 2:21 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:

        On 02/05/2014 01:23 PM, Craig Lewis wrote:

On 2/4/14 20:02 , Josh Durgin wrote:

>From the log it looks like you're hitting the default maximum number of
entries to be processed at once per shard. This was intended to prevent
one really busy shard from blocking progress on syncing other shards,
since the remainder will be synced the next time the shard is processed.
Perhaps the default is too low though, or the idea should be scrapped
altogether since you can sync other shards in parallel.

For your particular usage, since you're updating the same few buckets,
the max entries limit is hit constantly. You can increase it with
max-entries: 1000000 in the config file or --max-entries 10000000 on
the command line.

Josh

This doesn't appear to have any effect:
root@ceph1c:/etc/init.d# grep max-entries /etc/ceph/radosgw-agent.conf
max-entries: 1000000
root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after'
/var/log/ceph/radosgw-agent.log  | tail -1
2014-02-05T13:11:03.915 2743:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after
"00000206789.410535.3"

Neither does --max-entries 10000000:
root@ceph1c:/etc/init.d# ps auxww | grep radosgw-agent | grep max-entries
root     19710  6.0  0.0  74492 18708 pts/3    S    13:22 0:00
/usr/bin/python /usr/bin/radosgw-agent --incremental-sync-delay=10
--max-entries 10000000 -c /etc/ceph/radosgw-agent.conf
root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after'
/var/log/ceph/radosgw-agent.us-west-1.us-central-1.log  | tail -1
2014-02-05T13:22:58.577 21626:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after
"00000207788.411542.2"

I guess I'll look into that too, since I'll be in that area of the code.

It seems to be a hardcoded limit on the server side to prevent a single
osd operation from taking too long (see cls_log_list() in ceph.git
src/cls/cls_log.cc).

      Right, that part is intentional. Otherwise osd operation might take too long.

        This should probably be fixed in radosgw, but you could work around it
with a loop in radosgw-agent.

      No need to change the radosgw side, it's intentional. The agent should
assume requests are paged and behave accordingly.

Yehuda

    For the record, I can't lower the value either:

    root@ceph1c:/etc/init.d# ps auxww | grep radosgw-agent | grep
      max-entries

    root     16151  1.6  0.0 222384 18980 pts/3    Sl   15:22  
      0:01 /usr/bin/python /usr/bin/radosgw-agent
      --incremental-sync-delay=10 --max-entries 888 -c
      /etc/ceph/radosgw.replicate.us-west-1-to-us-central-1.conf

    root     17417  0.9  0.0 225056 19928 pts/3    Sl   15:22  
      0:00 /usr/bin/python /usr/bin/radosgw-agent
      --incremental-sync-delay=10 --max-entries 888 -c
      /etc/ceph/radosgw.replicate.us-west-1-to-us-central-1.conf

    root@ceph1c:/etc/init.d# egrep 'has [0-9]+ entries after'
      /var/log/ceph/radosgw-agent.us-west-1.us-central-1.log  | tail -1

    2014-02-05T15:23:00.802
      17417:INFO:radosgw_agent.worker:bucket instance
      "live-2:us-west-1.35026898.2" has 1000 entries after
      "00000217778.421759.2"

    I'll take a stab at it.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com