some new findings...
yesterday evening i saved a 'find /gluster-export -type f -exec ls
-lisa' on all masternodes to look for stale filelink's.
afterwards the geo-replication restarted several times and the latest
restart was caused by (entire sequence below) :
[ 16:06:45 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $cat
*gluster-export.gluster.log | egrep "No such file|stale linkfile" | tail -1
[2015-11-12 15:58:19.491418] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[ 16:07:18 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $
In the output of yesterday evening the mentioned linkfile exists on
ger-ber-09 and ger-ber-10 (vol ger-ber-01-replicate-1), not on
ger-ber-01-replicate-0.
root@dp-server:/home/tron/gluster# grep
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
ls-lisa-gluster-export-files-gluster-ger-ber-*
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out:86082371834 0
---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out:86085383674 0
---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root@dp-server:/home/tron/gluster#
root@dp-server:/home/tron/gluster# grep 86082371834
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out
86082371834 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86082371834 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root@dp-server:/home/tron/gluster# grep 86085383674
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out
86085383674 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86085383674 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root@dp-server:/home/tron/gluster#
After restart the linkfile/gfid does not exist anymore...apparently some
consistency check is running in the backgroud and geo-repl process is
dieing when such linkfile problem occurs...
should i have done a 'rebalance' on that volume before starting
geo-replication ?
currently I'm running a script which find those linkfiles in the saved
file-listings.
i think i have to delete them manually and observe how the
geo-replication reacts when it was running for some hours without
frequent restarts...
best regards
dietmar
[2015-11-12 13:45:14.295292] I [fuse-bridge.c:5520:fini] 0-fuse:
Unmounting '/tmp/gsyncd-aux-mount-M_LLid'.
[2015-11-12 15:58:19.491418] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[2015-11-12 15:58:19.493244] I
[dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 0-ger-ber-01-dht:
Returned with op_ret 0 and op_errno 0 for
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
[2015-11-12 15:58:31.409954] I [glusterfsd.c:1959:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6
(/usr/sbin/glusterfs --aux-gfid-mount
--log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%
4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
--volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
/tmp/gsyncd-aux-mount-iDrtTp)
[2015-11-12 15:58:31.431290] I [socket.c:3645:socket_init] 0-glusterfs:
SSL support is NOT enabled
[2015-11-12 15:58:31.431321] I [socket.c:3660:socket_init] 0-glusterfs:
using system polling thread
[2015-11-12 15:58:31.495912] I [dht-shared.c:314:dht_init_regex]
0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-11-12 15:58:31.497671] I [socket.c:3645:socket_init]
0-ger-ber-01-client-5: SSL support is NOT enabled
[2015-11-12 15:58:31.497694] I [socket.c:3660:socket_init]
0-ger-ber-01-client-5: using system polling thread
[2015-11-12 15:58:31.497815] I [socket.c:3645:socket_init]
0-ger-ber-01-client-4: SSL support is NOT enabled
[2015-11-12 15:58:31.497820] I [socket.c:3660:socket_init]
0-ger-ber-01-client-4: using system polling thread
[2015-11-12 15:58:31.497939] I [socket.c:3645:socket_init]
0-ger-ber-01-client-3: SSL support is NOT enabled
[2015-11-12 15:58:31.497945] I [socket.c:3660:socket_init]
0-ger-ber-01-client-3: using system polling thread
[2015-11-12 15:58:31.498059] I [socket.c:3645:socket_init]
0-ger-ber-01-client-2: SSL support is NOT enabled
[2015-11-12 15:58:31.498063] I [socket.c:3660:socket_init]
0-ger-ber-01-client-2: using system polling thread
[2015-11-12 15:58:31.498184] I [socket.c:3645:socket_init]
0-ger-ber-01-client-1: SSL support is NOT enabled
[2015-11-12 15:58:31.498189] I [socket.c:3660:socket_init]
0-ger-ber-01-client-1: using system polling thread
[2015-11-12 15:58:31.498298] I [socket.c:3645:socket_init]
0-ger-ber-01-client-0: SSL support is NOT enabled
[2015-11-12 15:58:31.498303] I [socket.c:3660:socket_init]
0-ger-ber-01-client-0: using system polling thread
[2015-11-12 15:58:31.498325] I [client.c:2294:notify]
0-ger-ber-01-client-0: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498603] I [client.c:2294:notify]
0-ger-ber-01-client-1: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498738] I [client.c:2294:notify]
0-ger-ber-01-client-2: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498904] I [client.c:2294:notify]
0-ger-ber-01-client-3: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.499064] I [client.c:2294:notify]
0-ger-ber-01-client-4: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.499192] I [client.c:2294:notify]
0-ger-ber-01-client-5: parent translators are ready, attempting connect
on transport
Final graph:
+------------------------------------------------------------------------------+
1: volume ger-ber-01-client-0
2: type protocol/client
3: option remote-host gluster-ger-ber-11-int
4: option remote-subvolume /gluster-export
5: option transport-type socket
6: option username f2720838-abbd-4629-bd24-8040ceb794c1
7: option password 5ee70f0b-e537-4d11-8692-863864e227e4
8: option ping-timeout 10
9: option send-gids true
10: end-volume
11:
12: volume ger-ber-01-client-1
13: type protocol/client
14: option remote-host gluster-ger-ber-12-int
15: option remote-subvolume /gluster-export
16: option transport-type socket
17: option username f2720838-abbd-4629-bd24-8040ceb794c1
18: option password 5ee70f0b-e537-4d11-8692-863864e227e4
19: option ping-timeout 10
20: option send-gids true
21: end-volume
22:
23: volume ger-ber-01-replicate-0
24: type cluster/replicate
25: subvolumes ger-ber-01-client-0 ger-ber-01-client-1
26: end-volume
27:
28: volume ger-ber-01-client-2
29: type protocol/client
30: option remote-host gluster-ger-ber-09-int
31: option remote-subvolume /gluster-export
32: option transport-type socket
33: option username f2720838-abbd-4629-bd24-8040ceb794c1
34: option password 5ee70f0b-e537-4d11-8692-863864e227e4
35: option ping-timeout 10
36: option send-gids true
37: end-volume
38:
39: volume ger-ber-01-client-3
40: type protocol/client
41: option remote-host gluster-ger-ber-10-int
42: option remote-subvolume /gluster-export
43: option transport-type socket
44: option username f2720838-abbd-4629-bd24-8040ceb794c1
45: option password 5ee70f0b-e537-4d11-8692-863864e227e4
46: option ping-timeout 10
47: option send-gids true
48: end-volume
49:
50: volume ger-ber-01-replicate-1
51: type cluster/replicate
52: subvolumes ger-ber-01-client-2 ger-ber-01-client-3
53: end-volume
54:
55: volume ger-ber-01-client-4
56: type protocol/client
57: option remote-host gluster-ger-ber-07-int
58: option remote-subvolume /gluster-export
59: option transport-type socket
60: option username f2720838-abbd-4629-bd24-8040ceb794c1
61: option password 5ee70f0b-e537-4d11-8692-863864e227e4
62: option ping-timeout 10
63: option send-gids true
64: end-volume
65:
66: volume ger-ber-01-client-5
67: type protocol/client
68: option remote-host gluster-ger-ber-08-int
69: option remote-subvolume /gluster-export
70: option transport-type socket
71: option username f2720838-abbd-4629-bd24-8040ceb794c1
72: option password 5ee70f0b-e537-4d11-8692-863864e227e4
73: option ping-timeout 10
74: option send-gids true
75: end-volume
76:
77: volume ger-ber-01-replicate-2
78: type cluster/replicate
79: subvolumes ger-ber-01-client-4 ger-ber-01-client-5
80: end-volume
81:
82: volume ger-ber-01-dht
83: type cluster/distribute
84: option min-free-disk 200GB
85: subvolumes ger-ber-01-replicate-0 ger-ber-01-replicate-1
ger-ber-01-replicate-2
86: end-volume
87:
88: volume ger-ber-01-write-behind
89: type performance/write-behind
90: subvolumes ger-ber-01-dht
91: end-volume
92:
93: volume ger-ber-01-read-ahead
94: type performance/read-ahead
95: subvolumes ger-ber-01-write-behind
96: end-volume
97:
98: volume ger-ber-01-io-cache
99: type performance/io-cache
100: option min-file-size 0
101: option cache-timeout 2
102: option cache-size 1024MB
103: subvolumes ger-ber-01-read-ahead
104: end-volume
105:
106: volume ger-ber-01-quick-read
107: type performance/quick-read
108: option cache-size 1024MB
109: subvolumes ger-ber-01-io-cache
110: end-volume
111:
112: volume ger-ber-01-md-cache
113: type performance/md-cache
114: subvolumes ger-ber-01-quick-read
115: end-volume
116:
117: volume ger-ber-01
118: type debug/io-stats
119: option latency-measurement off
120: option count-fop-hits off
121: subvolumes ger-ber-01-md-cache
122: end-volume
123:
124: volume gfid-access-autoload
125: type features/gfid-access
126: subvolumes ger-ber-01
127: end-volume
128:
+------------------------------------------------------------------------------+
[2015-11-12 15:58:31.499624] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-2: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500026] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-2: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.500138] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-4: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500160] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-5: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500565] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-2:
Connected to 10.0.1.101:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.500576] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-2:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.500611] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-1: Subvolume 'ger-ber-01-client-2' came back up;
going online.
[2015-11-12 15:58:31.500629] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-3: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500659] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-0: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500673] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-1: changing port to 49152 (from 0)
[2015-11-12 15:58:31.501206] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-2: Server lk version = 1
[2015-11-12 15:58:31.501239] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-4: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501380] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-5: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501479] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-0: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501540] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-4:
Connected to 10.0.1.147:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501546] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-4:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501566] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-2: Subvolume 'ger-ber-01-client-4' came back up;
going online.
[2015-11-12 15:58:31.501576] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-3: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501649] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-5:
Connected to 10.0.1.149:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501654] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-5:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501729] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-1: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501753] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-4: Server lk version = 1
[2015-11-12 15:58:31.501778] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-0:
Connected to 10.0.1.105:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501783] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501797] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-0: Subvolume 'ger-ber-01-client-0' came back up;
going online.
[2015-11-12 15:58:31.501820] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-5: Server lk version = 1
[2015-11-12 15:58:31.501934] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-3:
Connected to 10.0.1.103:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501939] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-3:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501982] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-0: Server lk version = 1
[2015-11-12 15:58:31.502061] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-1:
Connected to 10.0.1.107:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.502067] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.504554] I [fuse-bridge.c:5022:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-11-12 15:58:31.504588] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-1: Server lk version = 1
[2015-11-12 15:58:31.504595] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-3: Server lk version = 1
[2015-11-12 15:58:31.504657] I [fuse-bridge.c:3959:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
kernel 7.22
[ 16:13:28 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $
Am 12.11.2015 um 13:23 schrieb Dietmar Putz:
Hello Aravinda,
thank you for replay...
answers inside...
best regards
dietmar
Am 12.11.2015 um 07:22 schrieb Aravinda:
Please paste some lines from log files where you see "Operation not
permitted" errors.
the appearance of these "Operation not permitted" warnings on the
active slave are so frequent that there are no corresponding message
on the master...
what i observe is that 1270 different gfid's are processed again and
again...
[2015-11-12 10:53:55.355550] W [fuse-bridge.c:1942:fuse_create_cbk]
0-glusterfs-fuse: 29348: /.gfid/f5f57c10-381d-4537-8988-f22e16fab523
=> -1 (Operation not permitted)
[ 10:56:59 ] - root@gluster-wien-05
/var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
*.gluster.log.1 | grep "Operation not permitted" | wc -l
54756
[ 11:02:28 ] - root@gluster-wien-05
/var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
*.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' |
head -2
/.gfid/f857ac77-6afc-494f-88c7-304aec6cc8b5
/.gfid/f87e0361-3915-495f-bdb2-0c87155f4ddd
[ 11:02:48 ] - root@gluster-wien-05
/var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
*.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' |
sort | uniq | wc -l
1270
[ 11:03:14 ] - root@gluster-wien-05
/var/log/glusterfs/geo-replication-slaves $
these gfid's are filling up the geo-repl. logfile on the master after
each restart...
according to the master_gfid_file.txt the files behind the gfid's are
available on master and slave.
i observe many geo restarts on the master node and as far as i can see
they are caused by two different reasons ..."No such file or
directory" and "attempting deletion of stale linkfile"
[ 11:30:08 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $cat
*gluster-export.gluster.log | grep "No such file or directory" | tail -1
[2015-11-12 09:26:35.942478] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup
of /cache/img/img/articles/508e370a62a2e4.87470810.jpeg on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[ 11:30:31 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $cat
*gluster-export.gluster.log | grep "deletion of stale linkfile" | tail -1
[2015-11-11 23:08:44.487825] I
[dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht:
attempting deletion of stale linkfile
/cache/img/download/508813939afa33.63956784.pdf.gz on
ger-ber-01-replicate-1
[ 11:31:10 ] - root@gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $cat
*gluster-export.gluster.log | egrep -i "no such file or
directory|deletion of stale linkfile" | wc -l
115
any time such an error appears the geo-replication prozess
restarts...meanwhile 115 times since 8th nov.
"attempting to delete stale linkfile"
example :
/cache/img/articles/50856893226ce0.55808655.jpeg.gz
[2015-11-10 23:48:14.273733] W [glusterfsd.c:1095:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbf8be6c47d]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbf8c13f182]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fbf8cc39265])))
0-: received signum (15), shutting down
[2015-11-10 23:48:14.273768] I [fuse-bridge.c:5520:fini] 0-fuse:
Unmounting '/tmp/gsyncd-aux-mount-QLm2DI'.
[2015-11-11 01:14:26.972164] I
[dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht:
attempting deletion of stale linkfile
/cache/img/articles/50856893226ce0.55808655.jpeg.gz on
ger-ber-01-replicate-1
[2015-11-11 01:14:26.999234] I
[dht-common.c:825:dht_lookup_unlink_cbk] 0-ger-ber-01-dht:
lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for
/cache/img/articles/50856893226ce0.55808655.jpeg.gz
[2015-11-11 01:14:39.060592] I [glusterfsd.c:1959:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount
--log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
--volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
/tmp/gsyncd-aux-mount-zXLm3a)
[2015-11-11 01:14:39.093942] I [socket.c:3645:socket_init]
0-glusterfs: SSL support is NOT enabled
[2015-11-11 01:14:39.093969] I [socket.c:3660:socket_init]
0-glusterfs: using system polling thread
[2015-11-11 01:14:39.171506] I [dht-shared.c:314:dht_init_regex]
0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-11-11 01:14:39.173291] I [socket.c:3645:socket_init]
0-ger-ber-01-client-5: SSL support is NOT enabled
[2015-11-11 01:14:39.173304] I [socket.c:3660:socket_init]
0-ger-ber-01-client-5: using system polling thread
[2015-11-11 01:14:39.173426] I [socket.c:3645:socket_init]
0-ger-ber-01-client-4: SSL support is NOT enabled
[2015-11-11 01:14:39.173430] I [socket.c:3660:socket_init]
0-ger-ber-01-client-4: using system polling thread
[2015-11-11 01:14:39.173545] I [socket.c:3645:socket_init]
0-ger-ber-01-client-3: SSL support is NOT enabled
[2015-11-11 01:14:39.173550] I [socket.c:3660:socket_init]
0-ger-ber-01-client-3: using system polling thread
[2015-11-11 01:14:39.173667] I [socket.c:3645:socket_init]
0-ger-ber-01-client-2: SSL support is NOT enabled
[2015-11-11 01:14:39.173671] I [socket.c:3660:socket_init]
0-ger-ber-01-client-2: using system polling thread
[2015-11-11 01:14:39.173795] I [socket.c:3645:socket_init]
0-ger-ber-01-client-1: SSL support is NOT enabled
[2015-11-11 01:14:39.173800] I [socket.c:3660:socket_init]
0-ger-ber-01-client-1: using system polling thread
[2015-11-11 01:14:39.173916] I [socket.c:3645:socket_init]
0-ger-ber-01-client-0: SSL support is NOT enabled
[2015-11-11 01:14:39.173921] I [socket.c:3660:socket_init]
0-ger-ber-01-client-0: using system polling thread
[2015-11-11 01:14:39.173944] I [client.c:2294:notify]
0-ger-ber-01-client-0: parent translators are ready, attempting
connect on transport
[2015-11-11 01:14:39.174163] I [client.c:2294:notify]
0-ger-ber-01-client-1: parent translators are ready, attempting
connect on transport
[2015-11-11 01:14:39.174299] I [client.c:2294:notify]
0-ger-ber-01-client-2: parent translators are ready, attempting
connect on transport
[2015-11-11 01:14:39.174474] I [client.c:2294:notify]
0-ger-ber-01-client-3: parent translators are ready, attempting
connect on transport
[2015-11-11 01:14:39.174620] I [client.c:2294:notify]
0-ger-ber-01-client-4: parent translators are ready, attempting
connect on transport
[2015-11-11 01:14:39.174752] I [client.c:2294:notify]
0-ger-ber-01-client-5: parent translators are ready, attempting
connect on transport
all of the stale linkfiles were stored on ger-ber-01-replicate-1 which
is gluster-ger-ber-09-int and gluster-ger-ber-10-int. obviously the
gluster successfully deletes the stale linkfile because afterwards i
can not find the mentioned file anymore but the geo-replication
process restarts and does not process the same linkfile again.
looking around i found some more of the stale linkfiles...size 0 and
no corresponding file.
so i come more and more to the conclusion that i possibly have a
preceding problem...
can i delete such stale linkfiles without any problems ...?
putz@sdn-de-gate-01:~/central/gluster-3.5$ ../mycommand.sh -H
gluster-ger -c "ls -l /gluster-export/cache/img/articles/*.gz"
Host : gluster-ger-ber-07-int
Return Code : 2 check result on gluster-ger-ber-07-int
-----------------------------------------------------
Host : gluster-ger-ber-08-int
Return Code : 2 check result on gluster-ger-ber-08-int
-----------------------------------------------------
Host : gluster-ger-ber-09-int
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
-----------------------------------------------------
Host : gluster-ger-ber-10-int
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
---------T 2 root root 0 Apr 23 2014
/gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
-----------------------------------------------------
Host : gluster-ger-ber-11-int
Return Code : 2 check result on gluster-ger-ber-11-int
-----------------------------------------------------
Host : gluster-ger-ber-12-int
Return Code : 2 check result on gluster-ger-ber-12-int
-----------------------------------------------------
The other reason for restart ... "No such file or directory"...still
no clue...
"No such file or directory"
[2015-11-10 11:12:39.975390] I [fuse-bridge.c:5520:fini] 0-fuse:
Unmounting '/tmp/gsyncd-aux-mount-JXonZH'.
[2015-11-10 13:07:39.532569] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup
of /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[2015-11-10 13:07:39.563707] I
[dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk]
0-ger-ber-01-dht: Returned with op_ret 0 and op_errno 0 for
/cache/img/articles/505c5538cd9d21.37413891.jpeg.gz
[2015-11-10 13:07:51.483713] I [glusterfsd.c:1959:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount
--log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
--volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
/tmp/gsyncd-aux-mount-H9kZUh)
[2015-11-10 13:07:51.511921] I [socket.c:3645:socket_init]
0-glusterfs: SSL support is NOT enabled
[2015-11-10 13:07:51.511943] I [socket.c:3660:socket_init]
0-glusterfs: using system polling thread
[2015-11-10 13:07:51.567950] I [dht-shared.c:314:dht_init_regex]
0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-11-10 13:07:51.570322] I [socket.c:3645:socket_init]
0-ger-ber-01-client-5: SSL support is NOT enabled
[2015-11-10 13:07:51.570340] I [socket.c:3660:socket_init]
0-ger-ber-01-client-5: using system polling thread
[2015-11-10 13:07:51.570497] I [socket.c:3645:socket_init]
0-ger-ber-01-client-4: SSL support is NOT enabled
[2015-11-10 13:07:51.570504] I [socket.c:3660:socket_init]
0-ger-ber-01-client-4: using system polling thread
[2015-11-10 13:07:51.570706] I [socket.c:3645:socket_init]
0-ger-ber-01-client-3: SSL support is NOT enabled
[2015-11-10 13:07:51.570724] I [socket.c:3660:socket_init]
0-ger-ber-01-client-3: using system polling thread
[2015-11-10 13:07:51.570888] I [socket.c:3645:socket_init]
0-ger-ber-01-client-2: SSL support is NOT enabled
[2015-11-10 13:07:51.570895] I [socket.c:3660:socket_init]
0-ger-ber-01-client-2: using system polling thread
[2015-11-10 13:07:51.571086] I [socket.c:3645:socket_init]
0-ger-ber-01-client-1: SSL support is NOT enabled
[2015-11-10 13:07:51.571103] I [socket.c:3660:socket_init]
0-ger-ber-01-client-1: using system polling thread
[2015-11-10 13:07:51.571259] I [socket.c:3645:socket_init]
0-ger-ber-01-client-0: SSL support is NOT enabled
[2015-11-10 13:07:51.571266] I [socket.c:3660:socket_init]
0-ger-ber-01-client-0: using system polling thread
[2015-11-10 13:07:51.571298] I [client.c:2294:notify]
0-ger-ber-01-client-0: parent translators are ready, attempting
connect on transport
[2015-11-10 13:07:51.571602] I [client.c:2294:notify]
0-ger-ber-01-client-1: parent translators are ready, attempting
connect on transport
[2015-11-10 13:07:51.571862] I [client.c:2294:notify]
0-ger-ber-01-client-2: parent translators are ready, attempting
connect on transport
[2015-11-10 13:07:51.572118] I [client.c:2294:notify]
0-ger-ber-01-client-3: parent translators are ready, attempting
connect on transport
[2015-11-10 13:07:51.572350] I [client.c:2294:notify]
0-ger-ber-01-client-4: parent translators are ready, attempting
connect on transport
[2015-11-10 13:07:51.572647] I [client.c:2294:notify]
0-ger-ber-01-client-5: parent translators are ready, attempting
connect on transport
any other logfiles helpful ?
Were there any errors observed during step 3 as specified in
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
step 3 of section "scheduling a downtime" was running without any errors.
step 3 of section "upgrade steps for geo-replication" was started
twice...after first attempt i recognized that the slave-upgrade.sh
script uses the standard ssh port. after temporarily reconfiguration
to port 22 of each slave ssh-deamon the script was running without any
problems in a screen session and asked for each slave password...
Some rows in status output are still in "Not Started", are there any
errors in /var/log/glusterfs/geo-replication/*.log in those nodes?
yes, the other master nodes are not permitted to access the slaves
nodes by iptables on the slave...just gluster-ger-ber-08-int and
gluster-ger-ber-09-int are allowed to access each slave node.
should that be adjusted to allow each master node to access each slave
node...or does it 'just' influence the reliability of geo-replication
and does not affect the geo-rep. problem ?
[2015-11-09 12:45:40.30857] E [resource(monitor):204:errlog] Popen:
command "ssh -p 2503 -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem gluster-wien-02 gluster
--xml --remote-host=localhost volume info aut-wien-01" returned with
255, saying:
[2015-11-09 12:45:40.61755] E [resource(monitor):207:logerr] Popen:
ssh> ssh: connect to host gluster-wien-02 port 2503: Connection timed out
[2015-11-09 12:45:40.62242] I [syncdutils(monitor):192:finalize]
<top>: exiting.
regards
Aravinda
On 11/10/2015 10:50 PM, Dietmar Putz wrote:
Hi all,
i need some help with a geo-replication issue...
recently i upgraded two 6-node distributed-replicated gluster from
ubuntu 12.04.5 lts to 14.04.3 lts resp. glusterfs 3.4.7 to 3.5.6
since then the geo-replication does not start syncing but remains as
shown in the 'status detail' output below for about 48h.
I followed the hints for upgrade with an existing geo-replication :
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
the master_gfid_file.txt was created and applied to the slave
volume. geo-replication was started with 'force' option.
in the gluster.log on the slave i can find thousands of lines with
messages like :
".../.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 => -1 (Operation not
permitted)"
and no files are synced.
I'm not sure whats going on and since there are about 40TByte of
data already replicated by the old 3.4.7 setup I have some fear to
try around...
so i got some questions...maybe somebody can give me some hints...
1. as shown in the example below the trusted.gfid of the same file
differs in master and slave volume. as far as i understood the
upgrade-howto after applying the master_gfid_file.txt on the slave
they should be the same on master and slave...is that right ?
2. as shown in the config below the change_detector is 'xsync'.
Somewhere i red that xsync is used for the initial replication and
is changing to 'change_log' later on when the entire sync is done.
should i try to modify the change_detector to 'change_log', does it
make sense...?
any other idea which could help me to solve this problem....?
best regards
dietmar
[ 11:10:01 ] - root@gluster-ger-ber-09 ~ $glusterfs --version
glusterfs 3.5.6 built on Sep 16 2015 15:27:30
...
[ 11:11:37 ] - root@gluster-ger-ber-09 ~ $cat
/var/lib/glusterd/glusterd.info | grep operating-version
operating-version=30501
[ 10:55:35 ] - root@gluster-ger-ber-09 ~ $gluster volume
geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 status
detail
MASTER NODE MASTER VOL MASTER BRICK
SLAVE STATUS CHECKPOINT STATUS
CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING
DELETES PENDING FILES SKIPPED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gluster-ger-ber-09 ger-ber-01 /gluster-export
gluster-wien-05-int::aut-wien-01 Active N/A Hybrid Crawl 0
8191 0 0 0
gluster-ger-ber-11 ger-ber-01 /gluster-export
ssh://gluster-wien-02::aut-wien-01 Not Started
N/A N/A N/A N/A N/A N/A N/A
gluster-ger-ber-10 ger-ber-01 /gluster-export
ssh://gluster-wien-02::aut-wien-01 Not Started
N/A N/A N/A N/A N/A N/A N/A
gluster-ger-ber-12 ger-ber-01 /gluster-export
ssh://gluster-wien-02::aut-wien-01 Not Started
N/A N/A N/A N/A N/A N/A N/A
gluster-ger-ber-07 ger-ber-01 /gluster-export
ssh://gluster-wien-02::aut-wien-01 Not Started
N/A N/A N/A N/A N/A N/A N/A
gluster-ger-ber-08 ger-ber-01 /gluster-export
gluster-wien-04-int::aut-wien-01 Passive N/A N/A 0
0 0 0 0
[ 10:55:48 ] - root@gluster-ger-ber-09 ~ $
[ 10:56:56 ] - root@gluster-ger-ber-09 ~ $gluster volume
geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 config
special_sync_mode: partial
state_socket_unencoded:
/var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.socket
gluster_log_file:
/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
ssh_command: ssh -p 2503 -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem
ignore_deletes: true
change_detector: xsync
ssh_command_tar: ssh -p 2503 -oPasswordAuthentication=no
-oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/tar_ssh.pem
working_dir:
/var/run/gluster/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01
remote_gsyncd: /nonexistent/gsyncd
log_file:
/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.log
socketdir: /var/run
state_file:
/var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.status
state_detail_file:
/var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01-detail.status
session_owner: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
gluster_command_dir: /usr/sbin/
pid_file:
/var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.pid
georep_session_working_dir:
/var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/
gluster_params: aux-gfid-mount
volume_id: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
[ 11:10:01 ] - root@gluster-ger-ber-09 ~ $
[ 12:45:34 ] - root@gluster-wien-05
/var/log/glusterfs/geo-replication-slaves $tail -f
6a071cfa-b150-4f0b-b1ed-96ab5d4bd671\:gluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
[2015-11-10 12:59:16.097932] W [fuse-bridge.c:1942:fuse_create_cbk]
0-glusterfs-fuse: 54267: /.gfid/1abb953b-aa9d-4fa3-9a72-415204057572
=> -1 (Operation not permitted)
[2015-11-10 12:59:16.098044] W [defaults.c:1381:default_release]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(+0xfb4d)
[0x7fc9cd104b4d]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(free_fuse_state+0x85)
[0x7fc9cd0fab95]
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x10e)
[0x7fc9cf52ec9e]))) 0-fuse: xlator does not implement release_cbk
...
grep 1abb953b-aa9d-4fa3-9a72-415204057572 master_gfid_file.txt
1abb953b-aa9d-4fa3-9a72-415204057572 1050/hyve/364/14158.mp4
putz@sdn-de-gate-01:~/central$ ./mycommand.sh -H
gluster-ger,gluster-wien -c "getfattr -m . -d -e hex
/gluster-export/1050/hyve/364/14158.mp4"
...
master volume :
-----------------------------------------------------
Host : gluster-ger-ber-09-int
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
trusted.gfid=0x1abb953baa9d4fa39a72415204057572
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
Host : gluster-ger-ber-10-int
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
trusted.gfid=0x1abb953baa9d4fa39a72415204057572
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
...
slave volume :
Host : gluster-wien-04
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
Host : gluster-wien-05
# file: gluster-export/1050/hyve/364/14158.mp4
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
-----------------------------------------------------
...
putz@sdn-de-gate-01:~/central$
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users