Re: CHANGELOGs and new geo-replica sync taking forever

Aravinda <avishwan@xxxxxxxxxx> · Thu, 5 Nov 2015 11:46:51 +0530

Thanks for the detailed mail. Is Geo-replication status showing Faulty? 
Please share the output of `gluster volume geo-replication status`

Looks like Geo-replication is halted due to some unrecoverable error 
during replication. Please share the log files from Master and Slave 
nodes to root cause the issue.

Questions...:
o Why so many CHANGELOGs?
One Changelog gets generated every 15 seconds only if changes happened 
in that brick.

o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so 
43211/5/24/60=6
  implies a real transfer rate of about 6 CHANGELOG files per minute, 
which
  brings me back to xx's generating new ones at about that rate...)?
Changelogs are not transferred. Changelogs seen in Slave nodes are 
generated in Slave Volume since Changelog is enabled for Slave volume 
also. Changelogs are parsed in Master and Files are replicated to Slave 
volume in two steps
    1. Entry Creation using RPC
    2. Data sync using Rsync

o What can I do to "fix" this?
Please share the log files, we will look into the issue and help in 
resolving this issue.

regards
Aravinda

On 11/04/2015 11:00 PM, Brian Ericson wrote:
tl;dr -- geo-replication of ~200,000 CHANGELOG files is killing me... 
Help!

I have about 125G spread over just shy of 5000 files that I'm 
replicating with
geo-replication to nodes around the world.  The content is fairly 
stable and
probably hasn't changed at all since I initially established the 
GlusterFS
nodes/network, which looks as follows:
x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx geo-replicates to 
xxx/xxy)

Latency & throughput are markedly different (x -> xx is the fastest, 
xx -> xxx
the slowest (at about 1G/hour)). That said, all nodes were synced with 
5 days
of setting up the network.

I have since added another node, xxz, which is also geo-replicated 
from xx (xx
-> xxz). Its latency/throughput is clearly better than xx -> xxx's, 
but over 5
days later, I'm still replicating CHANGELOGs and haven't gotten to any 
real
content (the replicated volumes' mounted filesystems are empty).

Starting with x, you can see I have a "reasonable" number of CHANGELOGs:
x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
186

However, xxz's source is xx, and I've got a real problem with xx:
xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
193450

5+ days into this, and I've hardly managed to dent this on xxz:
xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
43211

On top of that, xx is generating new CHANGELOGs at a rate of ~6/minute 
(two
volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly) 
moving target.

And these files are small! The "I'm alive" file is 92 bytes long, I've 
also
seen them also average about 4k. Demonstrating latency/throughput, you 
can see
that small files (for me) are a real killer:
### x -> xx (fastest route)
# for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
"$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
}' )"; done
1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null'; done ) 
|& awk '/^real/{ print $2 }' )"; done
1 (3984k): 0m4.777s
10 (398k): 0m10.737s
100 (39k): 0m53.286s
1000 (3k): 7m21.493s

### xx -> xxx (slowest route)
# for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
"$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
}' )"; done
1 (3984k): 0m11.065s
10 (398k): 0m41.007s
100 (39k): 4m52.814s
1000 (3k): 39m23.009s

### xx -> xxz (the route I've added and am trying to sync)
# for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
"$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
}' )"; done
1 (3984k): 0m2.673s
10 (398k): 0m16.333s
100 (39k): 2m0.676s
1000 (3k): 17m28.265s

What you're looking at is the cost of transferring a total of 4000k: 1 
transfer
at 4000k, 10@400k, 100@40k, and 1000@4k. With 1 transfer at under 3s 
and 1000
transfers at nearly 17 1/2 minutes for xx -> xxz and for the same total
transfer size, it's really a killer to transfer CHANGELOGs, especially 
almost
200,000 of them.

And, 92 byte files doesn't improve this:
### x -> xx (fastest route)
# file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& 
awk '/^real/{ print $2 }' )"
100 (92): 0m34.164s

### xx -> xxx (slowest route)
# file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& 
awk '/^real/{ print $2 }' )"
100 (92): 3m53.388s

### xx -> xxz (the route I've added and am trying to sync)
# file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& 
awk '/^real/{ print $2 }' )"
100 (92): 1m43.389s

Questions...:
o Why so many CHANGELOGs?

o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so 
43211/5/24/60=6
  implies a real transfer rate of about 6 CHANGELOG files per minute, 
which
  brings me back to xx's generating new ones at about that rate...)?

o What can I do to "fix" this?

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users