Re: CHANGELOGs and new geo-replica sync taking forever

Wade Fitzpatrick <wade.fitzpatrick@xxxxxxxxxxxxxxxx> · Thu, 5 Nov 2015 10:46:02 +1000



    I also had problems getting geo-replication working correctly and
    eventually gave it up due to project time constraints.

    
    What version of gluster?

    What is the topology of x, xx, and xxx/xxy/xxz?

    
    I tried a 2x2 stripe-replica with geo-replication to a 2x1 stripe
    using 3.7.4. Starting replication with 32 GB of small files never
    completed, it failed several times. Starting replication with an
    empty volume then filling it with a rate limit of 2000k/s managed to
    keep sync until completion but could not handle the rate of change
    under normal usage.

    
    On 5/11/2015 3:30 AM, Brian Ericson
      wrote:

    
    tl;dr --
      geo-replication of ~200,000 CHANGELOG files is killing me... Help!
      

      I have about 125G spread over just shy of 5000 files that I'm
      replicating with
      

      geo-replication to nodes around the world.  The content is fairly
      stable and
      

      probably hasn't changed at all since I initially established the
      GlusterFS
      

      nodes/network, which looks as follows:
      

      x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx
      geo-replicates to xxx/xxy)
      

      Latency & throughput are markedly different (x -> xx is the
      fastest, xx -> xxx
      

      the slowest (at about 1G/hour)). That said, all nodes were synced
      with 5 days
      

      of setting up the network.
      

      I have since added another node, xxz, which is also geo-replicated
      from xx (xx
      

      -> xxz). Its latency/throughput is clearly better than xx ->
      xxx's, but over 5
      

      days later, I'm still replicating CHANGELOGs and haven't gotten to
      any real
      

      content (the replicated volumes' mounted filesystems are empty).
      

      Starting with x, you can see I have a "reasonable" number of
      CHANGELOGs:
      

      x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
      

      186
      

      However, xxz's source is xx, and I've got a real problem with xx:
      

      xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc
      -l
      

      193450
      

      5+ days into this, and I've hardly managed to dent this on xxz:
      

      xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc
      -l
      

      43211
      

      On top of that, xx is generating new CHANGELOGs at a rate of
      ~6/minute (two
      

      volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly)
      moving target.
      

      And these files are small! The "I'm alive" file is 92 bytes long,
      I've also
      

      seen them also average about 4k. Demonstrating latency/throughput,
      you can see
      

      that small files (for me) are a real killer:
      

      ### x -> xx (fastest route)
      

      # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024
      count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n
      "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do
      echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& awk
      '/^real/{ print $2 }' )"; done
      

      1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null';
      done ) |& awk '/^real/{ print $2 }' )"; done
      

      1 (3984k): 0m4.777s
      

      10 (398k): 0m10.737s
      

      100 (39k): 0m53.286s
      

      1000 (3k): 7m21.493s
      

      ### xx -> xxx (slowest route)
      

      # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024
      count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n
      "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do
      echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk
      '/^real/{ print $2 }' )"; done
      

      1 (3984k): 0m11.065s
      

      10 (398k): 0m41.007s
      

      100 (39k): 4m52.814s
      

      1000 (3k): 39m23.009s
      

      ### xx -> xxz (the route I've added and am trying to sync)
      

      # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024
      count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n
      "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do
      echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk
      '/^real/{ print $2 }' )"; done
      

      1 (3984k): 0m2.673s
      

      10 (398k): 0m16.333s
      

      100 (39k): 2m0.676s
      

      1000 (3k): 17m28.265s
      

      What you're looking at is the cost of transferring a total of
      4000k: 1 transfer
      

      at 4000k, 10@400k, 100@40k, and 1000@4k. With 1 transfer at under
      3s and 1000
      

      transfers at nearly 17 1/2 minutes for xx -> xxz and for the
      same total
      

      transfer size, it's really a killer to transfer CHANGELOGs,
      especially almost
      

      200,000 of them.
      

      And, 92 byte files doesn't improve this:
      

      ### x -> xx (fastest route)
      

      # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )";
      i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time
      for i in $( seq 1 $i ); do echo -n "$file" | ssh xx 'cat >
      /dev/null'; done ) |& awk '/^real/{ print $2 }' )"
      

      100 (92): 0m34.164s
      

      ### xx -> xxx (slowest route)
      

      # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )";
      i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time
      for i in $( seq 1 $i ); do echo -n "$file" | ssh xxx 'cat >
      /dev/null'; done ) |& awk '/^real/{ print $2 }' )"
      

      100 (92): 3m53.388s
      

      ### xx -> xxz (the route I've added and am trying to sync)
      

      # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )";
      i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time
      for i in $( seq 1 $i ); do echo -n "$file" | ssh xxz 'cat >
      /dev/null'; done ) |& awk '/^real/{ print $2 }' )"
      

      100 (92): 1m43.389s
      

      Questions...:
      

      o Why so many CHANGELOGs?
      

      o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so
      43211/5/24/60=6
      

        implies a real transfer rate of about 6 CHANGELOG files per
      minute, which
      

        brings me back to xx's generating new ones at about that
      rate...)?
      

      o What can I do to "fix" this?
      

      _______________________________________________
      

      Gluster-users mailing list
      

      Gluster-users@xxxxxxxxxxx
      

      http://www.gluster.org/mailman/listinfo/gluster-users
      

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users