Re: geo-replication not syncing files...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



some new findings...
yesterday evening i saved a 'find /gluster-export -type f -exec ls -lisa' on all masternodes to look for stale filelink's. afterwards the geo-replication restarted several times and the latest restart was caused by (entire sequence below) :

[ 16:06:45 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $cat *gluster-export.gluster.log | egrep "No such file|stale linkfile" | tail -1 [2015-11-12 15:58:19.491418] I [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of /cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on ger-ber-01-replicate-0 (following linkfile) failed (No such file or directory) [ 16:07:18 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $

In the output of yesterday evening the mentioned linkfile exists on ger-ber-09 and ger-ber-10 (vol ger-ber-01-replicate-1), not on ger-ber-01-replicate-0.

root@dp-server:/home/tron/gluster# grep /cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg ls-lisa-gluster-export-files-gluster-ger-ber-* ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out:86082371834 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out:86085383674 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root@dp-server:/home/tron/gluster#
root@dp-server:/home/tron/gluster# grep 86082371834 ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out 86082371834 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7 86082371834 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg root@dp-server:/home/tron/gluster# grep 86085383674 ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out 86085383674 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7 86085383674 0 ---------T 2 root root 0 Apr 18 2014 /gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root@dp-server:/home/tron/gluster#

After restart the linkfile/gfid does not exist anymore...apparently some consistency check is running in the backgroud and geo-repl process is dieing when such linkfile problem occurs... should i have done a 'rebalance' on that volume before starting geo-replication ?

currently I'm running a script which find those linkfiles in the saved file-listings. i think i have to delete them manually and observe how the geo-replication reacts when it was running for some hours without frequent restarts...

best regards
dietmar



[2015-11-12 13:45:14.295292] I [fuse-bridge.c:5520:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-M_LLid'. [2015-11-12 15:58:19.491418] I [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of /cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on ger-ber-01-replicate-0 (following linkfile) failed (No such file or directory) [2015-11-12 15:58:19.493244] I [dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 0-ger-ber-01-dht: Returned with op_ret 0 and op_errno 0 for /cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg [2015-11-12 15:58:31.409954] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot% 4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 /tmp/gsyncd-aux-mount-iDrtTp) [2015-11-12 15:58:31.431290] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2015-11-12 15:58:31.431321] I [socket.c:3660:socket_init] 0-glusterfs: using system polling thread [2015-11-12 15:58:31.495912] I [dht-shared.c:314:dht_init_regex] 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-11-12 15:58:31.497671] I [socket.c:3645:socket_init] 0-ger-ber-01-client-5: SSL support is NOT enabled [2015-11-12 15:58:31.497694] I [socket.c:3660:socket_init] 0-ger-ber-01-client-5: using system polling thread [2015-11-12 15:58:31.497815] I [socket.c:3645:socket_init] 0-ger-ber-01-client-4: SSL support is NOT enabled [2015-11-12 15:58:31.497820] I [socket.c:3660:socket_init] 0-ger-ber-01-client-4: using system polling thread [2015-11-12 15:58:31.497939] I [socket.c:3645:socket_init] 0-ger-ber-01-client-3: SSL support is NOT enabled [2015-11-12 15:58:31.497945] I [socket.c:3660:socket_init] 0-ger-ber-01-client-3: using system polling thread [2015-11-12 15:58:31.498059] I [socket.c:3645:socket_init] 0-ger-ber-01-client-2: SSL support is NOT enabled [2015-11-12 15:58:31.498063] I [socket.c:3660:socket_init] 0-ger-ber-01-client-2: using system polling thread [2015-11-12 15:58:31.498184] I [socket.c:3645:socket_init] 0-ger-ber-01-client-1: SSL support is NOT enabled [2015-11-12 15:58:31.498189] I [socket.c:3660:socket_init] 0-ger-ber-01-client-1: using system polling thread [2015-11-12 15:58:31.498298] I [socket.c:3645:socket_init] 0-ger-ber-01-client-0: SSL support is NOT enabled [2015-11-12 15:58:31.498303] I [socket.c:3660:socket_init] 0-ger-ber-01-client-0: using system polling thread [2015-11-12 15:58:31.498325] I [client.c:2294:notify] 0-ger-ber-01-client-0: parent translators are ready, attempting connect on transport [2015-11-12 15:58:31.498603] I [client.c:2294:notify] 0-ger-ber-01-client-1: parent translators are ready, attempting connect on transport [2015-11-12 15:58:31.498738] I [client.c:2294:notify] 0-ger-ber-01-client-2: parent translators are ready, attempting connect on transport [2015-11-12 15:58:31.498904] I [client.c:2294:notify] 0-ger-ber-01-client-3: parent translators are ready, attempting connect on transport [2015-11-12 15:58:31.499064] I [client.c:2294:notify] 0-ger-ber-01-client-4: parent translators are ready, attempting connect on transport [2015-11-12 15:58:31.499192] I [client.c:2294:notify] 0-ger-ber-01-client-5: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume ger-ber-01-client-0
  2:     type protocol/client
  3:     option remote-host gluster-ger-ber-11-int
  4:     option remote-subvolume /gluster-export
  5:     option transport-type socket
  6:     option username f2720838-abbd-4629-bd24-8040ceb794c1
  7:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
  8:     option ping-timeout 10
  9:     option send-gids true
 10: end-volume
 11:
 12: volume ger-ber-01-client-1
 13:     type protocol/client
 14:     option remote-host gluster-ger-ber-12-int
 15:     option remote-subvolume /gluster-export
 16:     option transport-type socket
 17:     option username f2720838-abbd-4629-bd24-8040ceb794c1
 18:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
 19:     option ping-timeout 10
 20:     option send-gids true
 21: end-volume
 22:
 23: volume ger-ber-01-replicate-0
 24:     type cluster/replicate
 25:     subvolumes ger-ber-01-client-0 ger-ber-01-client-1
 26: end-volume
 27:
 28: volume ger-ber-01-client-2
 29:     type protocol/client
 30:     option remote-host gluster-ger-ber-09-int
 31:     option remote-subvolume /gluster-export
 32:     option transport-type socket
 33:     option username f2720838-abbd-4629-bd24-8040ceb794c1
 34:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
 35:     option ping-timeout 10
 36:     option send-gids true
 37: end-volume
 38:
 39: volume ger-ber-01-client-3
 40:     type protocol/client
 41:     option remote-host gluster-ger-ber-10-int
 42:     option remote-subvolume /gluster-export
 43:     option transport-type socket
 44:     option username f2720838-abbd-4629-bd24-8040ceb794c1
 45:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
 46:     option ping-timeout 10
 47:     option send-gids true
 48: end-volume
 49:
 50: volume ger-ber-01-replicate-1
 51:     type cluster/replicate
 52:     subvolumes ger-ber-01-client-2 ger-ber-01-client-3
 53: end-volume
 54:
 55: volume ger-ber-01-client-4
 56:     type protocol/client
 57:     option remote-host gluster-ger-ber-07-int
 58:     option remote-subvolume /gluster-export
 59:     option transport-type socket
 60:     option username f2720838-abbd-4629-bd24-8040ceb794c1
 61:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
 62:     option ping-timeout 10
 63:     option send-gids true
 64: end-volume
 65:
 66: volume ger-ber-01-client-5
 67:     type protocol/client
 68:     option remote-host gluster-ger-ber-08-int
 69:     option remote-subvolume /gluster-export
 70:     option transport-type socket
 71:     option username f2720838-abbd-4629-bd24-8040ceb794c1
 72:     option password 5ee70f0b-e537-4d11-8692-863864e227e4
 73:     option ping-timeout 10
 74:     option send-gids true
 75: end-volume
 76:
 77: volume ger-ber-01-replicate-2
 78:     type cluster/replicate
 79:     subvolumes ger-ber-01-client-4 ger-ber-01-client-5
 80: end-volume
 81:
 82: volume ger-ber-01-dht
 83:     type cluster/distribute
 84:     option min-free-disk 200GB
85: subvolumes ger-ber-01-replicate-0 ger-ber-01-replicate-1 ger-ber-01-replicate-2
 86: end-volume
 87:
 88: volume ger-ber-01-write-behind
 89:     type performance/write-behind
 90:     subvolumes ger-ber-01-dht
 91: end-volume
 92:
 93: volume ger-ber-01-read-ahead
 94:     type performance/read-ahead
 95:     subvolumes ger-ber-01-write-behind
 96: end-volume
 97:
 98: volume ger-ber-01-io-cache
 99:     type performance/io-cache
100:     option min-file-size 0
101:     option cache-timeout 2
102:     option cache-size 1024MB
103:     subvolumes ger-ber-01-read-ahead
104: end-volume
105:
106: volume ger-ber-01-quick-read
107:     type performance/quick-read
108:     option cache-size 1024MB
109:     subvolumes ger-ber-01-io-cache
110: end-volume
111:
112: volume ger-ber-01-md-cache
113:     type performance/md-cache
114:     subvolumes ger-ber-01-quick-read
115: end-volume
116:
117: volume ger-ber-01
118:     type debug/io-stats
119:     option latency-measurement off
120:     option count-fop-hits off
121:     subvolumes ger-ber-01-md-cache
122: end-volume
123:
124: volume gfid-access-autoload
125:     type features/gfid-access
126:     subvolumes ger-ber-01
127: end-volume
128:
+------------------------------------------------------------------------------+
[2015-11-12 15:58:31.499624] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-2: changing port to 49152 (from 0) [2015-11-12 15:58:31.500026] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.500138] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-4: changing port to 49152 (from 0) [2015-11-12 15:58:31.500160] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-5: changing port to 49152 (from 0) [2015-11-12 15:58:31.500565] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-2: Connected to 10.0.1.101:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.500576] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-2: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.500611] I [afr-common.c:4487:afr_notify] 0-ger-ber-01-replicate-1: Subvolume 'ger-ber-01-client-2' came back up; going online. [2015-11-12 15:58:31.500629] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-3: changing port to 49152 (from 0) [2015-11-12 15:58:31.500659] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-0: changing port to 49152 (from 0) [2015-11-12 15:58:31.500673] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ger-ber-01-client-1: changing port to 49152 (from 0) [2015-11-12 15:58:31.501206] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-2: Server lk version = 1 [2015-11-12 15:58:31.501239] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.501380] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.501479] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.501540] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-4: Connected to 10.0.1.147:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.501546] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-4: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.501566] I [afr-common.c:4487:afr_notify] 0-ger-ber-01-replicate-2: Subvolume 'ger-ber-01-client-4' came back up; going online. [2015-11-12 15:58:31.501576] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.501649] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-5: Connected to 10.0.1.149:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.501654] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-5: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.501729] I [client-handshake.c:1677:select_server_supported_programs] 0-ger-ber-01-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-11-12 15:58:31.501753] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-4: Server lk version = 1 [2015-11-12 15:58:31.501778] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-0: Connected to 10.0.1.105:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.501783] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.501797] I [afr-common.c:4487:afr_notify] 0-ger-ber-01-replicate-0: Subvolume 'ger-ber-01-client-0' came back up; going online. [2015-11-12 15:58:31.501820] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-5: Server lk version = 1 [2015-11-12 15:58:31.501934] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-3: Connected to 10.0.1.103:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.501939] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-3: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.501982] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-0: Server lk version = 1 [2015-11-12 15:58:31.502061] I [client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-1: Connected to 10.0.1.107:49152, attached to remote volume '/gluster-export'. [2015-11-12 15:58:31.502067] I [client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-11-12 15:58:31.504554] I [fuse-bridge.c:5022:fuse_graph_setup] 0-fuse: switched to graph 0 [2015-11-12 15:58:31.504588] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-1: Server lk version = 1 [2015-11-12 15:58:31.504595] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ger-ber-01-client-3: Server lk version = 1 [2015-11-12 15:58:31.504657] I [fuse-bridge.c:3959:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [ 16:13:28 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $



Am 12.11.2015 um 13:23 schrieb Dietmar Putz:
Hello Aravinda,

thank you for replay...
answers inside...

best regards
dietmar

Am 12.11.2015 um 07:22 schrieb Aravinda:
Please paste some lines from log files where you see "Operation not permitted" errors.

the appearance of these "Operation not permitted" warnings on the active slave are so frequent that there are no corresponding message on the master... what i observe is that 1270 different gfid's are processed again and again...

[2015-11-12 10:53:55.355550] W [fuse-bridge.c:1942:fuse_create_cbk] 0-glusterfs-fuse: 29348: /.gfid/f5f57c10-381d-4537-8988-f22e16fab523 => -1 (Operation not permitted) [ 10:56:59 ] - root@gluster-wien-05 /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log *.gluster.log.1 | grep "Operation not permitted" | wc -l
54756
[ 11:02:28 ] - root@gluster-wien-05 /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' | head -2
/.gfid/f857ac77-6afc-494f-88c7-304aec6cc8b5
/.gfid/f87e0361-3915-495f-bdb2-0c87155f4ddd
[ 11:02:48 ] - root@gluster-wien-05 /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' | sort | uniq | wc -l
1270
[ 11:03:14 ] - root@gluster-wien-05 /var/log/glusterfs/geo-replication-slaves $

these gfid's are filling up the geo-repl. logfile on the master after each restart... according to the master_gfid_file.txt the files behind the gfid's are available on master and slave.


i observe many geo restarts on the master node and as far as i can see they are caused by two different reasons ..."No such file or directory" and "attempting deletion of stale linkfile"

[ 11:30:08 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $cat *gluster-export.gluster.log | grep "No such file or directory" | tail -1

[2015-11-12 09:26:35.942478] I [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of /cache/img/img/articles/508e370a62a2e4.87470810.jpeg on ger-ber-01-replicate-0 (following linkfile) failed (No such file or directory)

[ 11:30:31 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $cat *gluster-export.gluster.log | grep "deletion of stale linkfile" | tail -1

[2015-11-11 23:08:44.487825] I [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht: attempting deletion of stale linkfile /cache/img/download/508813939afa33.63956784.pdf.gz on ger-ber-01-replicate-1

[ 11:31:10 ] - root@gluster-ger-ber-09 /var/log/glusterfs/geo-replication/ger-ber-01 $cat *gluster-export.gluster.log | egrep -i "no such file or directory|deletion of stale linkfile" | wc -l
115

any time such an error appears the geo-replication prozess restarts...meanwhile 115 times since 8th nov.


"attempting to delete stale linkfile"
example :
/cache/img/articles/50856893226ce0.55808655.jpeg.gz

[2015-11-10 23:48:14.273733] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbf8be6c47d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbf8c13f182] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fbf8cc39265]))) 0-: received signum (15), shutting down [2015-11-10 23:48:14.273768] I [fuse-bridge.c:5520:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-QLm2DI'. [2015-11-11 01:14:26.972164] I [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht: attempting deletion of stale linkfile /cache/img/articles/50856893226ce0.55808655.jpeg.gz on ger-ber-01-replicate-1 [2015-11-11 01:14:26.999234] I [dht-common.c:825:dht_lookup_unlink_cbk] 0-ger-ber-01-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /cache/img/articles/50856893226ce0.55808655.jpeg.gz [2015-11-11 01:14:39.060592] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 /tmp/gsyncd-aux-mount-zXLm3a) [2015-11-11 01:14:39.093942] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2015-11-11 01:14:39.093969] I [socket.c:3660:socket_init] 0-glusterfs: using system polling thread [2015-11-11 01:14:39.171506] I [dht-shared.c:314:dht_init_regex] 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-11-11 01:14:39.173291] I [socket.c:3645:socket_init] 0-ger-ber-01-client-5: SSL support is NOT enabled [2015-11-11 01:14:39.173304] I [socket.c:3660:socket_init] 0-ger-ber-01-client-5: using system polling thread [2015-11-11 01:14:39.173426] I [socket.c:3645:socket_init] 0-ger-ber-01-client-4: SSL support is NOT enabled [2015-11-11 01:14:39.173430] I [socket.c:3660:socket_init] 0-ger-ber-01-client-4: using system polling thread [2015-11-11 01:14:39.173545] I [socket.c:3645:socket_init] 0-ger-ber-01-client-3: SSL support is NOT enabled [2015-11-11 01:14:39.173550] I [socket.c:3660:socket_init] 0-ger-ber-01-client-3: using system polling thread [2015-11-11 01:14:39.173667] I [socket.c:3645:socket_init] 0-ger-ber-01-client-2: SSL support is NOT enabled [2015-11-11 01:14:39.173671] I [socket.c:3660:socket_init] 0-ger-ber-01-client-2: using system polling thread [2015-11-11 01:14:39.173795] I [socket.c:3645:socket_init] 0-ger-ber-01-client-1: SSL support is NOT enabled [2015-11-11 01:14:39.173800] I [socket.c:3660:socket_init] 0-ger-ber-01-client-1: using system polling thread [2015-11-11 01:14:39.173916] I [socket.c:3645:socket_init] 0-ger-ber-01-client-0: SSL support is NOT enabled [2015-11-11 01:14:39.173921] I [socket.c:3660:socket_init] 0-ger-ber-01-client-0: using system polling thread [2015-11-11 01:14:39.173944] I [client.c:2294:notify] 0-ger-ber-01-client-0: parent translators are ready, attempting connect on transport [2015-11-11 01:14:39.174163] I [client.c:2294:notify] 0-ger-ber-01-client-1: parent translators are ready, attempting connect on transport [2015-11-11 01:14:39.174299] I [client.c:2294:notify] 0-ger-ber-01-client-2: parent translators are ready, attempting connect on transport [2015-11-11 01:14:39.174474] I [client.c:2294:notify] 0-ger-ber-01-client-3: parent translators are ready, attempting connect on transport [2015-11-11 01:14:39.174620] I [client.c:2294:notify] 0-ger-ber-01-client-4: parent translators are ready, attempting connect on transport [2015-11-11 01:14:39.174752] I [client.c:2294:notify] 0-ger-ber-01-client-5: parent translators are ready, attempting connect on transport


all of the stale linkfiles were stored on ger-ber-01-replicate-1 which is gluster-ger-ber-09-int and gluster-ger-ber-10-int. obviously the gluster successfully deletes the stale linkfile because afterwards i can not find the mentioned file anymore but the geo-replication process restarts and does not process the same linkfile again. looking around i found some more of the stale linkfiles...size 0 and no corresponding file. so i come more and more to the conclusion that i possibly have a preceding problem...

can i delete such stale linkfiles without any problems ...?

putz@sdn-de-gate-01:~/central/gluster-3.5$ ../mycommand.sh -H gluster-ger -c "ls -l /gluster-export/cache/img/articles/*.gz"
Host : gluster-ger-ber-07-int
Return Code : 2     check result on gluster-ger-ber-07-int
-----------------------------------------------------
Host : gluster-ger-ber-08-int
Return Code : 2     check result on gluster-ger-ber-08-int
-----------------------------------------------------
Host : gluster-ger-ber-09-int
---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
-----------------------------------------------------
Host : gluster-ger-ber-10-int
---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz ---------T 2 root root 0 Apr 23 2014 /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
-----------------------------------------------------
Host : gluster-ger-ber-11-int
Return Code : 2     check result on gluster-ger-ber-11-int
-----------------------------------------------------
Host : gluster-ger-ber-12-int
Return Code : 2     check result on gluster-ger-ber-12-int
-----------------------------------------------------


The other reason for restart ... "No such file or directory"...still no clue...

"No such file or directory"

[2015-11-10 11:12:39.975390] I [fuse-bridge.c:5520:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-JXonZH'. [2015-11-10 13:07:39.532569] I [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz on ger-ber-01-replicate-0 (following linkfile) failed (No such file or directory) [2015-11-10 13:07:39.563707] I [dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 0-ger-ber-01-dht: Returned with op_ret 0 and op_errno 0 for /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz [2015-11-10 13:07:51.483713] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1 /tmp/gsyncd-aux-mount-H9kZUh) [2015-11-10 13:07:51.511921] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2015-11-10 13:07:51.511943] I [socket.c:3660:socket_init] 0-glusterfs: using system polling thread [2015-11-10 13:07:51.567950] I [dht-shared.c:314:dht_init_regex] 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-11-10 13:07:51.570322] I [socket.c:3645:socket_init] 0-ger-ber-01-client-5: SSL support is NOT enabled [2015-11-10 13:07:51.570340] I [socket.c:3660:socket_init] 0-ger-ber-01-client-5: using system polling thread [2015-11-10 13:07:51.570497] I [socket.c:3645:socket_init] 0-ger-ber-01-client-4: SSL support is NOT enabled [2015-11-10 13:07:51.570504] I [socket.c:3660:socket_init] 0-ger-ber-01-client-4: using system polling thread [2015-11-10 13:07:51.570706] I [socket.c:3645:socket_init] 0-ger-ber-01-client-3: SSL support is NOT enabled [2015-11-10 13:07:51.570724] I [socket.c:3660:socket_init] 0-ger-ber-01-client-3: using system polling thread [2015-11-10 13:07:51.570888] I [socket.c:3645:socket_init] 0-ger-ber-01-client-2: SSL support is NOT enabled [2015-11-10 13:07:51.570895] I [socket.c:3660:socket_init] 0-ger-ber-01-client-2: using system polling thread [2015-11-10 13:07:51.571086] I [socket.c:3645:socket_init] 0-ger-ber-01-client-1: SSL support is NOT enabled [2015-11-10 13:07:51.571103] I [socket.c:3660:socket_init] 0-ger-ber-01-client-1: using system polling thread [2015-11-10 13:07:51.571259] I [socket.c:3645:socket_init] 0-ger-ber-01-client-0: SSL support is NOT enabled [2015-11-10 13:07:51.571266] I [socket.c:3660:socket_init] 0-ger-ber-01-client-0: using system polling thread [2015-11-10 13:07:51.571298] I [client.c:2294:notify] 0-ger-ber-01-client-0: parent translators are ready, attempting connect on transport [2015-11-10 13:07:51.571602] I [client.c:2294:notify] 0-ger-ber-01-client-1: parent translators are ready, attempting connect on transport [2015-11-10 13:07:51.571862] I [client.c:2294:notify] 0-ger-ber-01-client-2: parent translators are ready, attempting connect on transport [2015-11-10 13:07:51.572118] I [client.c:2294:notify] 0-ger-ber-01-client-3: parent translators are ready, attempting connect on transport [2015-11-10 13:07:51.572350] I [client.c:2294:notify] 0-ger-ber-01-client-4: parent translators are ready, attempting connect on transport [2015-11-10 13:07:51.572647] I [client.c:2294:notify] 0-ger-ber-01-client-5: parent translators are ready, attempting connect on transport


any other logfiles helpful ?



Were there any errors observed during step 3 as specified in http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5

step 3 of section "scheduling a downtime" was running without any errors.
step 3 of section "upgrade steps for geo-replication" was started twice...after first attempt i recognized that the slave-upgrade.sh script uses the standard ssh port. after temporarily reconfiguration to port 22 of each slave ssh-deamon the script was running without any problems in a screen session and asked for each slave password...


Some rows in status output are still in "Not Started", are there any errors in /var/log/glusterfs/geo-replication/*.log in those nodes?
yes, the other master nodes are not permitted to access the slaves nodes by iptables on the slave...just gluster-ger-ber-08-int and gluster-ger-ber-09-int are allowed to access each slave node. should that be adjusted to allow each master node to access each slave node...or does it 'just' influence the reliability of geo-replication and does not affect the geo-rep. problem ?

[2015-11-09 12:45:40.30857] E [resource(monitor):204:errlog] Popen: command "ssh -p 2503 -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem gluster-wien-02 gluster --xml --remote-host=localhost volume info aut-wien-01" returned with 255, saying: [2015-11-09 12:45:40.61755] E [resource(monitor):207:logerr] Popen: ssh> ssh: connect to host gluster-wien-02 port 2503: Connection timed out [2015-11-09 12:45:40.62242] I [syncdutils(monitor):192:finalize] <top>: exiting.


regards
Aravinda

On 11/10/2015 10:50 PM, Dietmar Putz wrote:
Hi all,

i need some help with a geo-replication issue...
recently i upgraded two 6-node distributed-replicated gluster from ubuntu 12.04.5 lts to 14.04.3 lts resp. glusterfs 3.4.7 to 3.5.6 since then the geo-replication does not start syncing but remains as shown in the 'status detail' output below for about 48h.

I followed the hints for upgrade with an existing geo-replication :
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5

the master_gfid_file.txt was created and applied to the slave volume. geo-replication was started with 'force' option. in the gluster.log on the slave i can find thousands of lines with messages like : ".../.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 => -1 (Operation not permitted)"
and no files are synced.

I'm not sure whats going on and since there are about 40TByte of data already replicated by the old 3.4.7 setup I have some fear to try around...
so i got some questions...maybe somebody can give me some hints...

1. as shown in the example below the trusted.gfid of the same file differs in master and slave volume. as far as i understood the upgrade-howto after applying the master_gfid_file.txt on the slave they should be the same on master and slave...is that right ? 2. as shown in the config below the change_detector is 'xsync'. Somewhere i red that xsync is used for the initial replication and is changing to 'change_log' later on when the entire sync is done. should i try to modify the change_detector to 'change_log', does it make sense...?

any other idea which could help me to solve this problem....?

best regards
dietmar




[ 11:10:01 ] - root@gluster-ger-ber-09  ~ $glusterfs --version
glusterfs 3.5.6 built on Sep 16 2015 15:27:30
...
[ 11:11:37 ] - root@gluster-ger-ber-09 ~ $cat /var/lib/glusterd/glusterd.info | grep operating-version
operating-version=30501


[ 10:55:35 ] - root@gluster-ger-ber-09 ~ $gluster volume geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 status detail

MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ gluster-ger-ber-09 ger-ber-01 /gluster-export gluster-wien-05-int::aut-wien-01 Active N/A Hybrid Crawl 0 8191 0 0 0 gluster-ger-ber-11 ger-ber-01 /gluster-export ssh://gluster-wien-02::aut-wien-01 Not Started N/A N/A N/A N/A N/A N/A N/A gluster-ger-ber-10 ger-ber-01 /gluster-export ssh://gluster-wien-02::aut-wien-01 Not Started N/A N/A N/A N/A N/A N/A N/A gluster-ger-ber-12 ger-ber-01 /gluster-export ssh://gluster-wien-02::aut-wien-01 Not Started N/A N/A N/A N/A N/A N/A N/A gluster-ger-ber-07 ger-ber-01 /gluster-export ssh://gluster-wien-02::aut-wien-01 Not Started N/A N/A N/A N/A N/A N/A N/A gluster-ger-ber-08 ger-ber-01 /gluster-export gluster-wien-04-int::aut-wien-01 Passive N/A N/A 0 0 0 0 0
[ 10:55:48 ] - root@gluster-ger-ber-09  ~ $


[ 10:56:56 ] - root@gluster-ger-ber-09 ~ $gluster volume geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 config
special_sync_mode: partial
state_socket_unencoded: /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.socket gluster_log_file: /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log ssh_command: ssh -p 2503 -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
ignore_deletes: true
change_detector: xsync
ssh_command_tar: ssh -p 2503 -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem working_dir: /var/run/gluster/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01
remote_gsyncd: /nonexistent/gsyncd
log_file: /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.log
socketdir: /var/run
state_file: /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.status state_detail_file: /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01-detail.status
session_owner: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
gluster_command_dir: /usr/sbin/
pid_file: /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/
gluster_params: aux-gfid-mount
volume_id: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
[ 11:10:01 ] - root@gluster-ger-ber-09  ~ $



[ 12:45:34 ] - root@gluster-wien-05 /var/log/glusterfs/geo-replication-slaves $tail -f 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671\:gluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log [2015-11-10 12:59:16.097932] W [fuse-bridge.c:1942:fuse_create_cbk] 0-glusterfs-fuse: 54267: /.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 => -1 (Operation not permitted) [2015-11-10 12:59:16.098044] W [defaults.c:1381:default_release] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(+0xfb4d) [0x7fc9cd104b4d] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(free_fuse_state+0x85) [0x7fc9cd0fab95] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x10e) [0x7fc9cf52ec9e]))) 0-fuse: xlator does not implement release_cbk
...


grep 1abb953b-aa9d-4fa3-9a72-415204057572 master_gfid_file.txt
1abb953b-aa9d-4fa3-9a72-415204057572 1050/hyve/364/14158.mp4

putz@sdn-de-gate-01:~/central$ ./mycommand.sh -H gluster-ger,gluster-wien -c "getfattr -m . -d -e hex /gluster-export/1050/hyve/364/14158.mp4"
...
master volume :
-----------------------------------------------------
Host : gluster-ger-ber-09-int
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
trusted.gfid=0x1abb953baa9d4fa39a72415204057572
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
Host : gluster-ger-ber-10-int
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
trusted.gfid=0x1abb953baa9d4fa39a72415204057572
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
...
slave volume :
Host : gluster-wien-04
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
Host : gluster-wien-05
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
...
putz@sdn-de-gate-01:~/central$





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux