Re: glusterfs under high load failing?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Roman,
Everything in the logs look okay to me, except the following profile number:
      3.91 1255944.81 us     127.00 us 23397532.00 us            189       FSYNC

It seems that at least one of the fsyncs is taking almost 23 seconds to complete. According to all the data you gave till now, I feel this is the only thing I feel could have done it. To test this bit, could you turn off the following option using and try again?

gluster volume set <volname> cluster.ensure-durability off

Let me know what happened. I am extremely curious to here about it.

Pranith

On 10/17/2014 12:04 PM, Roman wrote:
mount

[2014-10-13 17:36:56.758654] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs --direct-io-mode=enable --fuse-mountopts=default_permissions,allow_other,max_read=131072 --volfile-server=stor1 --volfile-server=stor2 --volfile-id=HA-WIN-TT-1T --fuse-mountopts=default_permissions,allow_other,max_read=131072 /srv/nfs/HA-WIN-TT-1T)
[2014-10-13 17:36:56.762162] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:36:56.762223] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:36:56.766686] I [dht-shared.c:311:dht_init_regex] 0-HA-WIN-TT-1T-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-10-13 17:36:56.768887] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-1: SSL support is NOT enabled
[2014-10-13 17:36:56.768939] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-1: using system polling thread
[2014-10-13 17:36:56.769280] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-0: SSL support is NOT enabled
[2014-10-13 17:36:56.769294] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-0: using system polling thread
[2014-10-13 17:36:56.769336] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:36:56.769829] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-WIN-TT-1T-client-0
  2:     type protocol/client
  3:     option remote-host stor1
  4:     option remote-subvolume /exports/NFS-WIN/1T
  5:     option transport-type socket
  6:     option ping-timeout 10
  7:     option send-gids true
  8: end-volume
  9:
 10: volume HA-WIN-TT-1T-client-1
 11:     type protocol/client
 12:     option remote-host stor2
 13:     option remote-subvolume /exports/NFS-WIN/1T
 14:     option transport-type socket
 15:     option ping-timeout 10
 16:     option send-gids true
 17: end-volume
 18:
 19: volume HA-WIN-TT-1T-replicate-0
 20:     type cluster/replicate
 21:     subvolumes HA-WIN-TT-1T-client-0 HA-WIN-TT-1T-client-1
 22: end-volume
 23:
 24: volume HA-WIN-TT-1T-dht
 25:     type cluster/distribute
 26:     subvolumes HA-WIN-TT-1T-replicate-0
 27: end-volume
 28:
 29: volume HA-WIN-TT-1T-write-behind
 30:     type performance/write-behind
 31:     subvolumes HA-WIN-TT-1T-dht
 32: end-volume
 33:
 34: volume HA-WIN-TT-1T-read-ahead
 35:     type performance/read-ahead
 36:     subvolumes HA-WIN-TT-1T-write-behind
 37: end-volume
 38:
 39: volume HA-WIN-TT-1T-io-cache
 40:     type performance/io-cache
 41:     subvolumes HA-WIN-TT-1T-read-ahead
 42: end-volume
 43:
 44: volume HA-WIN-TT-1T-quick-read
 45:     type performance/quick-read
 46:     subvolumes HA-WIN-TT-1T-io-cache
 47: end-volume
 48:
 49: volume HA-WIN-TT-1T-open-behind
 50:     type performance/open-behind
 51:     subvolumes HA-WIN-TT-1T-quick-read
 52: end-volume
 53:
 54: volume HA-WIN-TT-1T-md-cache
 55:     type performance/md-cache
 56:     subvolumes HA-WIN-TT-1T-open-behind
 57: end-volume
 58:
 59: volume HA-WIN-TT-1T
 60:     type debug/io-stats
 61:     option latency-measurement off
 62:     option count-fop-hits off
 63:     subvolumes HA-WIN-TT-1T-md-cache
 64: end-volume
 65:
+------------------------------------------------------------------------------+
[2014-10-13 17:36:56.770718] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-1: changing port to 49160 (from 0)
[2014-10-13 17:36:56.771378] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-10-13 17:36:56.772008] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:36:56.772083] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:36:56.772338] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:36:56.772361] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:36:56.772424] I [afr-common.c:4131:afr_notify] 0-HA-WIN-TT-1T-replicate-0: Subvolume 'HA-WIN-TT-1T-client-1' came back up; going online.
[2014-10-13 17:36:56.772463] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:36:56.772477] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:36:56.779099] I [fuse-bridge.c:4977:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-10-13 17:36:56.779338] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1
[2014-10-13 17:36:56.779367] I [fuse-bridge.c:3914:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.17
[2014-10-13 17:36:56.779438] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-10-13 17:37:02.010942] I [fuse-bridge.c:4818:fuse_thread_proc] 0-fuse: unmounting /srv/nfs/HA-WIN-TT-1T
[2014-10-13 17:37:02.011296] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc7b7672e6d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7fc7b7d20b50] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fc7b95add55]))) 0-: received signum (15), shutting down
[2014-10-13 17:37:02.011316] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/srv/nfs/HA-WIN-TT-1T'.
[2014-10-13 17:37:31.133036] W [socket.c:522:__socket_rwv] 0-HA-WIN-TT-1T-client-0: readv on 10.250.0.1:49160 failed (No data available)
[2014-10-13 17:37:31.133110] I [client.c:2229:client_rpc_notify] 0-HA-WIN-TT-1T-client-0: disconnected from 10.250.0.1:49160. Client process will keep trying to connect to glusterd until brick's port is available
[2014-10-13 17:37:33.317437] W [socket.c:522:__socket_rwv] 0-HA-WIN-TT-1T-client-1: readv on 10.250.0.2:49160 failed (No data available)
[2014-10-13 17:37:33.317478] I [client.c:2229:client_rpc_notify] 0-HA-WIN-TT-1T-client-1: disconnected from 10.250.0.2:49160. Client process will keep trying to connect to glusterd until brick's port is available
[2014-10-13 17:37:33.317496] E [afr-common.c:4168:afr_notify] 0-HA-WIN-TT-1T-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2014-10-13 17:37:42.045604] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-10-13 17:37:42.046177] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:37:42.048863] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:37:42.048883] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:37:42.048897] I [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1 fds open - Delaying child_up until they are re-opened
[2014-10-13 17:37:42.049299] W [client-handshake.c:980:client3_3_reopen_cbk] 0-HA-WIN-TT-1T-client-0: reopen on <gfid:b00e322a-7bae-479f-91e0-1fd77c73692b> failed (Stale NFS file handle)
[2014-10-13 17:37:42.049328] I [client-handshake.c:936:client_child_up_reopen_done] 0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2014-10-13 17:37:42.049360] I [afr-common.c:4131:afr_notify] 0-HA-WIN-TT-1T-replicate-0: Subvolume 'HA-WIN-TT-1T-client-0' came back up; going online.
[2014-10-13 17:37:42.049446] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1
[2014-10-13 17:37:45.087592] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-1: changing port to 49160 (from 0)
[2014-10-13 17:37:45.088132] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:37:45.088343] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:37:45.088360] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:37:45.088373] I [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-1: 1 fds open - Delaying child_up until they are re-opened
[2014-10-13 17:37:45.088681] W [client-handshake.c:980:client3_3_reopen_cbk] 0-HA-WIN-TT-1T-client-1: reopen on <gfid:b00e322a-7bae-479f-91e0-1fd77c73692b> failed (Stale NFS file handle)
[2014-10-13 17:37:45.088697] I [client-handshake.c:936:client_child_up_reopen_done] 0-HA-WIN-TT-1T-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2014-10-13 17:37:45.088819] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-10-13 17:37:54.601822] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs --direct-io-mode=enable --fuse-mountopts=default_permissions,allow_other,max_read=131072 --volfile-server=stor1 --volfile-server=stor2 --volfile-id=HA-WIN-TT-1T --fuse-mountopts=default_permissions,allow_other,max_read=131072 /srv/nfs/HA-WIN-TT-1T)
[2014-10-13 17:37:54.604972] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:37:54.605034] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:37:54.609219] I [dht-shared.c:311:dht_init_regex] 0-HA-WIN-TT-1T-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-10-13 17:37:54.611421] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-1: SSL support is NOT enabled
[2014-10-13 17:37:54.611466] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-1: using system polling thread
[2014-10-13 17:37:54.611808] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-0: SSL support is NOT enabled
[2014-10-13 17:37:54.611821] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-0: using system polling thread
[2014-10-13 17:37:54.611862] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:37:54.612354] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-WIN-TT-1T-client-0
  2:     type protocol/client
  3:     option remote-host stor1
  4:     option remote-subvolume /exports/NFS-WIN/1T
  5:     option transport-type socket
  6:     option ping-timeout 10
  7:     option send-gids true
  8: end-volume
  9:
 10: volume HA-WIN-TT-1T-client-1
 11:     type protocol/client
 12:     option remote-host stor2
 13:     option remote-subvolume /exports/NFS-WIN/1T
 14:     option transport-type socket
 15:     option ping-timeout 10
 16:     option send-gids true
 17: end-volume
 18:
 19: volume HA-WIN-TT-1T-replicate-0
 20:     type cluster/replicate
 21:     subvolumes HA-WIN-TT-1T-client-0 HA-WIN-TT-1T-client-1
 22: end-volume
 23:
 24: volume HA-WIN-TT-1T-dht
 25:     type cluster/distribute
 26:     subvolumes HA-WIN-TT-1T-replicate-0
 27: end-volume
 28:
 29: volume HA-WIN-TT-1T-write-behind
 30:     type performance/write-behind
 31:     subvolumes HA-WIN-TT-1T-dht
 32: end-volume
 33:
 34: volume HA-WIN-TT-1T-read-ahead
 35:     type performance/read-ahead
 36:     subvolumes HA-WIN-TT-1T-write-behind
 37: end-volume
 38:
 39: volume HA-WIN-TT-1T-io-cache
 40:     type performance/io-cache
 41:     subvolumes HA-WIN-TT-1T-read-ahead
 42: end-volume
 43:
 44: volume HA-WIN-TT-1T-quick-read
 45:     type performance/quick-read
 46:     subvolumes HA-WIN-TT-1T-io-cache
 47: end-volume
 48:
 49: volume HA-WIN-TT-1T-open-behind
 50:     type performance/open-behind
 51:     subvolumes HA-WIN-TT-1T-quick-read
 52: end-volume
 53:
 54: volume HA-WIN-TT-1T-md-cache
 55:     type performance/md-cache
 56:     subvolumes HA-WIN-TT-1T-open-behind
 57: end-volume
 58:
 59: volume HA-WIN-TT-1T
 60:     type debug/io-stats
 61:     option latency-measurement off
 62:     option count-fop-hits off
 63:     subvolumes HA-WIN-TT-1T-md-cache
 64: end-volume
 65:
+------------------------------------------------------------------------------+
[2014-10-13 17:37:54.613137] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-10-13 17:37:54.613521] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-1: changing port to 49160 (from 0)
[2014-10-13 17:37:54.614228] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:37:54.614399] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:37:54.614483] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:37:54.614499] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:37:54.614557] I [afr-common.c:4131:afr_notify] 0-HA-WIN-TT-1T-replicate-0: Subvolume 'HA-WIN-TT-1T-client-0' came back up; going online.
[2014-10-13 17:37:54.614625] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1
[2014-10-13 17:37:54.614709] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:37:54.614724] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:37:54.621318] I [fuse-bridge.c:4977:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-10-13 17:37:54.621545] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-10-13 17:37:54.621617] I [fuse-bridge.c:3914:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.17
[2014-10-13 17:38:25.951778] W [client-rpc-fops.c:4235:client3_3_flush] 0-HA-WIN-TT-1T-client-0:  (b00e322a-7bae-479f-91e0-1fd77c73692b) remote_fd is -1. EBADFD
[2014-10-13 17:38:25.951827] W [client-rpc-fops.c:4235:client3_3_flush] 0-HA-WIN-TT-1T-client-1:  (b00e322a-7bae-479f-91e0-1fd77c73692b) remote_fd is -1. EBADFD
[2014-10-13 17:38:25.966963] I [fuse-bridge.c:4818:fuse_thread_proc] 0-fuse: unmounting /srv/nfs/HA-WIN-TT-1T
[2014-10-13 17:38:25.967174] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ffec893de6d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7ffec8febb50] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7ffeca878d55]))) 0-: received signum (15), shutting down
[2014-10-13 17:38:25.967194] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/srv/nfs/HA-WIN-TT-1T'.
[2014-10-13 17:40:21.500514] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:40:21.517782] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:40:21.524056] I [dht-shared.c:311:dht_init_regex] 0-HA-WIN-TT-1T-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-10-13 17:40:21.528430] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

glusterfshd stor1

2014-10-13 17:38:17.203360] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/75bbc77a676bde0d0afe20f40dc9e3e1.socket --xlator-option *replicate*.node-uuid=e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3)
[2014-10-13 17:38:17.204958] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2014-10-13 17:38:17.205016] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
[2014-10-13 17:38:17.205188] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:38:17.205209] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:38:17.207840] I [graph.c:254:gf_add_cmdline_options] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: adding option 'node-uuid' for volume 'HA-2TB-TT-Proxmox-cluster-replicate-0' with value 'e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3'
[2014-10-13 17:38:17.209433] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: SSL support is NOT enabled
[2014-10-13 17:38:17.209448] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: using system polling thread
[2014-10-13 17:38:17.209625] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: SSL support is NOT enabled
[2014-10-13 17:38:17.209634] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: using system polling thread
[2014-10-13 17:38:17.209652] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:17.210241] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-2TB-TT-Proxmox-cluster-client-0
  2:     type protocol/client
  3:     option remote-host stor1
  4:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
  5:     option transport-type socket
  6:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
  7:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
  8:     option ping-timeout 10
  9: end-volume
 10:
 11: volume HA-2TB-TT-Proxmox-cluster-client-1
 12:     type protocol/client
 13:     option remote-host stor2
 14:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
 15:     option transport-type socket
 16:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
 17:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
 18:     option ping-timeout 10
 19: end-volume
 20:
 21: volume HA-2TB-TT-Proxmox-cluster-replicate-0
 22:     type cluster/replicate
 23:     option node-uuid e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3
 24:     option background-self-heal-count 0
 25:     option metadata-self-heal on
 26:     option data-self-heal on
 27:     option entry-self-heal on
 28:     option self-heal-daemon on
 29:     option iam-self-heal-daemon yes
 30:     subvolumes HA-2TB-TT-Proxmox-cluster-client-0 HA-2TB-TT-Proxmox-cluster-client-1
 31: end-volume
 32:
 33: volume glustershd
 34:     type debug/io-stats
 35:     subvolumes HA-2TB-TT-Proxmox-cluster-replicate-0
 36: end-volume
 37:
+------------------------------------------------------------------------------+
[2014-10-13 17:38:17.210709] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)
[2014-10-13 17:38:17.211008] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:17.211170] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:17.211195] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:17.211250] I [afr-common.c:4131:afr_notify] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Subvolume 'HA-2TB-TT-Proxmox-cluster-client-0' came back up; going online.
[2014-10-13 17:38:17.211297] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1
[2014-10-13 17:38:17.211656] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Another crawl is in progress for HA-2TB-TT-Proxmox-cluster-client-0
[2014-10-13 17:38:17.211661] E [afr-self-heald.c:1479:afr_find_child_position] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: getxattr failed on HA-2TB-TT-Proxmox-cluster-client-1 - (Transport endpoint is not connected)
[2014-10-13 17:38:17.216327] E [afr-self-heal-data.c:1611:afr_sh_data_open_cbk] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: open of <gfid:65381af4-8e0b-4721-8214-71d29dcf5237> failed on child HA-2TB-TT-Proxmox-cluster-client-1 (Transport endpoint is not connected)
[2014-10-13 17:38:17.217372] E [afr-self-heal-data.c:1611:afr_sh_data_open_cbk] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: open of <gfid:65381af4-8e0b-4721-8214-71d29dcf5237> failed on child HA-2TB-TT-Proxmox-cluster-client-1 (Transport endpoint is not connected)
[2014-10-13 17:38:19.226057] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-1: changing port to 49159 (from 0)
[2014-10-13 17:38:19.226704] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:19.226896] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Connected to 10.250.0.2:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:19.226916] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:19.227031] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server lk version = 1
[2014-10-13 17:38:25.933950] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f1a7c03ce6d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f1a7c6eab50] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f1a7df77d55]))) 0-: received signum (15), shutting down
[2014-10-13 17:38:26.942918] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/75bbc77a676bde0d0afe20f40dc9e3e1.socket --xlator-option *replicate*.node-uuid=e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3)
[2014-10-13 17:38:26.944548] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2014-10-13 17:38:26.944584] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
[2014-10-13 17:38:26.944689] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:38:26.944701] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:38:26.946667] I [graph.c:254:gf_add_cmdline_options] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: adding option 'node-uuid' for volume 'HA-2TB-TT-Proxmox-cluster-replicate-0' with value 'e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3'
[2014-10-13 17:38:26.946684] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-replicate-0: adding option 'node-uuid' for volume 'HA-WIN-TT-1T-replicate-0' with value 'e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3'
[2014-10-13 17:38:26.948783] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: SSL support is NOT enabled
[2014-10-13 17:38:26.948809] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: using system polling thread
[2014-10-13 17:38:26.949118] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: SSL support is NOT enabled
[2014-10-13 17:38:26.949134] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: using system polling thread
[2014-10-13 17:38:26.951698] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-1: SSL support is NOT enabled
[2014-10-13 17:38:26.951715] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-1: using system polling thread
[2014-10-13 17:38:26.951921] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-0: SSL support is NOT enabled
[2014-10-13 17:38:26.951932] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-0: using system polling thread
[2014-10-13 17:38:26.951959] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:26.952612] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-1: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:26.952862] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:26.953447] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-2TB-TT-Proxmox-cluster-client-0
  2:     type protocol/client
  3:     option remote-host stor1
  4:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
  5:     option transport-type socket
  6:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
  7:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
  8:     option ping-timeout 10
  9: end-volume
 10:
 11: volume HA-2TB-TT-Proxmox-cluster-client-1
 12:     type protocol/client
 13:     option remote-host stor2
 14:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
 15:     option transport-type socket
 16:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
 17:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
 18:     option ping-timeout 10
 19: end-volume
 20:
 21: volume HA-2TB-TT-Proxmox-cluster-replicate-0
 22:     type cluster/replicate
 23:     option node-uuid e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3
 24:     option background-self-heal-count 0
 25:     option metadata-self-heal on
 26:     option data-self-heal on
 27:     option entry-self-heal on
 28:     option self-heal-daemon on
 29:     option iam-self-heal-daemon yes
 30:     subvolumes HA-2TB-TT-Proxmox-cluster-client-0 HA-2TB-TT-Proxmox-cluster-client-1
 31: end-volume
 32:
 33: volume HA-WIN-TT-1T-client-0
 34:     type protocol/client
 35:     option remote-host stor1
 36:     option remote-subvolume /exports/NFS-WIN/1T
 37:     option transport-type socket
 38:     option username 101b907c-ff21-47da-8ba6-37e2920691ce
 39:     option password f4f29094-891f-4241-8736-5e3302ed8bc8
 40:     option ping-timeout 10
 41: end-volume
 42:
 43: volume HA-WIN-TT-1T-client-1
 44:     type protocol/client
 45:     option remote-host stor2
 46:     option remote-subvolume /exports/NFS-WIN/1T
 47:     option transport-type socket
 48:     option username 101b907c-ff21-47da-8ba6-37e2920691ce
 49:     option password f4f29094-891f-4241-8736-5e3302ed8bc8
 50:     option ping-timeout 10
 51: end-volume
 52:
 53: volume HA-WIN-TT-1T-replicate-0
 54:     type cluster/replicate
 55:     option node-uuid e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3
 56:     option background-self-heal-count 0
 57:     option metadata-self-heal on
 58:     option data-self-heal on
 59:     option entry-self-heal on
 60:     option self-heal-daemon on
 61:     option iam-self-heal-daemon yes
 62:     subvolumes HA-WIN-TT-1T-client-0 HA-WIN-TT-1T-client-1
 63: end-volume
 64:
 65: volume glustershd
 66:     type debug/io-stats
 67:     subvolumes HA-2TB-TT-Proxmox-cluster-replicate-0 HA-WIN-TT-1T-replicate-0
 68: end-volume
 69:
+------------------------------------------------------------------------------+
[2014-10-13 17:38:26.954036] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)
[2014-10-13 17:38:26.954308] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-10-13 17:38:26.954741] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:26.954815] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:26.954999] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:26.955017] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:26.955073] I [afr-common.c:4131:afr_notify] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Subvolume 'HA-2TB-TT-Proxmox-cluster-client-0' came back up; going online.
[2014-10-13 17:38:26.955127] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1
[2014-10-13 17:38:26.955151] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:38:26.955161] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:26.955226] I [afr-common.c:4131:afr_notify] 0-HA-WIN-TT-1T-replicate-0: Subvolume 'HA-WIN-TT-1T-client-0' came back up; going online.
[2014-10-13 17:38:26.955297] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1
[2014-10-13 17:38:26.955583] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Another crawl is in progress for HA-2TB-TT-Proxmox-cluster-client-0
[2014-10-13 17:38:26.955589] E [afr-self-heald.c:1479:afr_find_child_position] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: getxattr failed on HA-2TB-TT-Proxmox-cluster-client-1 - (Transport endpoint is not connected)
[2014-10-13 17:38:26.955832] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-10-13 17:38:26.955858] E [afr-self-heald.c:1479:afr_find_child_position] 0-HA-WIN-TT-1T-replicate-0: getxattr failed on HA-WIN-TT-1T-client-1 - (Transport endpoint is not connected)
[2014-10-13 17:38:26.964913] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-1: changing port to 49159 (from 0)
[2014-10-13 17:38:26.965553] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:26.965794] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Connected to 10.250.0.2:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:26.965815] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:26.965968] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server lk version = 1
[2014-10-13 17:38:26.967510] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Another crawl is in progress for HA-2TB-TT-Proxmox-cluster-client-0
[2014-10-13 17:38:27.971374] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-1: changing port to 49160 (from 0)
[2014-10-13 17:38:27.971940] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:27.975460] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:38:27.975481] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:27.976656] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-10-13 17:41:05.390992] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.408292] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.412221] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-10-13 17:41:05.417388] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
root@stor1:~#

glusterfshd stor2

[2014-10-13 17:38:28.992891] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/b1494ca4d047df6e8590d7080131908f.socket --xlator-option *replicate*.node-uuid=abf9e3a7-eb91-4273-acdf-876cd6ba1fe3)
[2014-10-13 17:38:28.994439] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2014-10-13 17:38:28.994476] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
[2014-10-13 17:38:28.994581] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:38:28.994594] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:38:28.996569] I [graph.c:254:gf_add_cmdline_options] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: adding option 'node-uuid' for volume 'HA-2TB-TT-Proxmox-cluster-replicate-0' with value 'abf9e3a7-eb91-4273-acdf-876cd6ba1fe3'
[2014-10-13 17:38:28.996585] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-replicate-0: adding option 'node-uuid' for volume 'HA-WIN-TT-1T-replicate-0' with value 'abf9e3a7-eb91-4273-acdf-876cd6ba1fe3'
[2014-10-13 17:38:28.998463] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: SSL support is NOT enabled
[2014-10-13 17:38:28.998483] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-1: using system polling thread
[2014-10-13 17:38:28.998695] I [socket.c:3561:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: SSL support is NOT enabled
[2014-10-13 17:38:28.998707] I [socket.c:3576:socket_init] 0-HA-2TB-TT-Proxmox-cluster-client-0: using system polling thread
[2014-10-13 17:38:29.000506] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-1: SSL support is NOT enabled
[2014-10-13 17:38:29.000520] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-1: using system polling thread
[2014-10-13 17:38:29.000723] I [socket.c:3561:socket_init] 0-HA-WIN-TT-1T-client-0: SSL support is NOT enabled
[2014-10-13 17:38:29.000734] I [socket.c:3576:socket_init] 0-HA-WIN-TT-1T-client-0: using system polling thread
[2014-10-13 17:38:29.000762] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:29.001064] I [client.c:2294:notify] 0-HA-2TB-TT-Proxmox-cluster-client-1: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:29.001639] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-0: parent translators are ready, attempting connect on transport
[2014-10-13 17:38:29.001877] I [client.c:2294:notify] 0-HA-WIN-TT-1T-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-2TB-TT-Proxmox-cluster-client-0
  2:     type protocol/client
  3:     option remote-host stor1
  4:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
  5:     option transport-type socket
  6:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
  7:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
  8:     option ping-timeout 10
  9: end-volume
 10:
 11: volume HA-2TB-TT-Proxmox-cluster-client-1
 12:     type protocol/client
 13:     option remote-host stor2
 14:     option remote-subvolume /exports/HA-2TB-TT-Proxmox-cluster/2TB
 15:     option transport-type socket
 16:     option username 59c66122-55c1-4c28-956e-6189fcb1aff5
 17:     option password 34b79afb-a93c-431b-900a-b688e67cdbc9
 18:     option ping-timeout 10
 19: end-volume
 20:
 21: volume HA-2TB-TT-Proxmox-cluster-replicate-0
 22:     type cluster/replicate
 23:     option node-uuid abf9e3a7-eb91-4273-acdf-876cd6ba1fe3
 24:     option background-self-heal-count 0
 25:     option metadata-self-heal on
 26:     option data-self-heal on
 27:     option entry-self-heal on
 28:     option self-heal-daemon on
 29:     option iam-self-heal-daemon yes
 30:     subvolumes HA-2TB-TT-Proxmox-cluster-client-0 HA-2TB-TT-Proxmox-cluster-client-1
 31: end-volume
 32:
 33: volume HA-WIN-TT-1T-client-0
 34:     type protocol/client
 35:     option remote-host stor1
 36:     option remote-subvolume /exports/NFS-WIN/1T
 37:     option transport-type socket
 38:     option username 101b907c-ff21-47da-8ba6-37e2920691ce
 39:     option password f4f29094-891f-4241-8736-5e3302ed8bc8
 40:     option ping-timeout 10
 41: end-volume
 42:
 43: volume HA-WIN-TT-1T-client-1
 44:     type protocol/client
 45:     option remote-host stor2
 46:     option remote-subvolume /exports/NFS-WIN/1T
 47:     option transport-type socket
 48:     option username 101b907c-ff21-47da-8ba6-37e2920691ce
 49:     option password f4f29094-891f-4241-8736-5e3302ed8bc8
 50:     option ping-timeout 10
 51: end-volume
 52:
 53: volume HA-WIN-TT-1T-replicate-0
 54:     type cluster/replicate
 55:     option node-uuid abf9e3a7-eb91-4273-acdf-876cd6ba1fe3
 56:     option background-self-heal-count 0
 57:     option metadata-self-heal on
 58:     option data-self-heal on
 59:     option entry-self-heal on
 60:     option self-heal-daemon on
 61:     option iam-self-heal-daemon yes
 62:     subvolumes HA-WIN-TT-1T-client-0 HA-WIN-TT-1T-client-1
 63: end-volume
 64:
 65: volume glustershd
 66:     type debug/io-stats
 67:     subvolumes HA-2TB-TT-Proxmox-cluster-replicate-0 HA-WIN-TT-1T-replicate-0
 68: end-volume
 69:
+------------------------------------------------------------------------------+
[2014-10-13 17:38:29.002743] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-1: changing port to 49159 (from 0)
[2014-10-13 17:38:29.003027] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-1: changing port to 49160 (from 0)
[2014-10-13 17:38:29.003290] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)
[2014-10-13 17:38:29.003334] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-10-13 17:38:29.003922] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:29.004023] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:29.004139] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:29.004202] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Connected to 10.250.0.2:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:29.004217] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:29.004266] I [afr-common.c:4131:afr_notify] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Subvolume 'HA-2TB-TT-Proxmox-cluster-client-1' came back up; going online.
[2014-10-13 17:38:29.004318] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-10-13 17:38:29.004368] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:38:29.004383] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:29.004429] I [afr-common.c:4131:afr_notify] 0-HA-WIN-TT-1T-replicate-0: Subvolume 'HA-WIN-TT-1T-client-1' came back up; going online.
[2014-10-13 17:38:29.004483] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-1: Server lk version = 1
[2014-10-13 17:38:29.004506] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-10-13 17:38:29.004526] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
[2014-10-13 17:38:29.004535] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:29.004613] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-10-13 17:38:29.004626] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-10-13 17:38:29.004731] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1
[2014-10-13 17:38:29.004796] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1
[2014-10-13 17:38:29.005291] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-1
[2014-10-13 17:38:29.005303] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Another crawl is in progress for HA-2TB-TT-Proxmox-cluster-client-1
[2014-10-13 17:38:29.005443] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-2TB-TT-Proxmox-cluster-replicate-0: Another crawl is in progress for HA-2TB-TT-Proxmox-cluster-client-1
[2014-10-13 17:41:05.427867] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.443271] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.444111] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-10-13 17:41:05.444807] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

brick stor2

[2014-10-13 17:38:17.213386] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(+0x462a0) [0x7f343271f2a0] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12) [0x7f343371db12] (-->/usr/sbin/glusterfsd(glusterfs_handle_terminate+0x15) [0x7f3434790dd5]))) 0-: received signum (15), shutting down
[2014-10-13 17:38:26.957312] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.5.2 (/usr/sbin/glusterfsd -s stor2 --volfile-id HA-WIN-TT-1T.stor2.exports-NFS-WIN-1T -p /var/lib/glusterd/vols/HA-WIN-TT-1T/run/stor2-exports-NFS-WIN-1T.pid -S /var/run/91514691033d00e666bb151f9c771a26.socket --brick-name /exports/NFS-WIN/1T -l /var/log/glusterfs/bricks/exports-NFS-WIN-1T.log --xlator-option *-posix.glusterd-uuid=abf9e3a7-eb91-4273-acdf-876cd6ba1fe3 --brick-port 49160 --xlator-option HA-WIN-TT-1T-server.listen-port=49160)
[2014-10-13 17:38:26.958864] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2014-10-13 17:38:26.958899] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
[2014-10-13 17:38:26.959003] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:38:26.959015] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:38:26.961860] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-server: adding option 'listen-port' for volume 'HA-WIN-TT-1T-server' with value '49160'
[2014-10-13 17:38:26.961878] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-posix: adding option 'glusterd-uuid' for volume 'HA-WIN-TT-1T-posix' with value 'abf9e3a7-eb91-4273-acdf-876cd6ba1fe3'
[2014-10-13 17:38:26.965032] I [rpcsvc.c:2127:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2014-10-13 17:38:26.965075] W [options.c:888:xl_opt_validate] 0-HA-WIN-TT-1T-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2014-10-13 17:38:26.965097] I [socket.c:3561:socket_init] 0-tcp.HA-WIN-TT-1T-server: SSL support is NOT enabled
[2014-10-13 17:38:26.965105] I [socket.c:3576:socket_init] 0-tcp.HA-WIN-TT-1T-server: using system polling thread
[2014-10-13 17:38:26.965602] W [graph.c:329:_log_if_unknown_option] 0-HA-WIN-TT-1T-quota: option 'timeout' is not recognized
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-WIN-TT-1T-posix
  2:     type storage/posix
  3:     option glusterd-uuid abf9e3a7-eb91-4273-acdf-876cd6ba1fe3
  4:     option directory /exports/NFS-WIN/1T
  5:     option volume-id 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
  6: end-volume
  7:
  8: volume HA-WIN-TT-1T-changelog
  9:     type features/changelog
 10:     option changelog-brick /exports/NFS-WIN/1T
 11:     option changelog-dir /exports/NFS-WIN/1T/.glusterfs/changelogs
 12:     subvolumes HA-WIN-TT-1T-posix
 13: end-volume
 14:
 15: volume HA-WIN-TT-1T-access-control
 16:     type features/access-control
 17:     subvolumes HA-WIN-TT-1T-changelog
 18: end-volume
 19:
 20: volume HA-WIN-TT-1T-locks
 21:     type features/locks
 22:     subvolumes HA-WIN-TT-1T-access-control
 23: end-volume
 24:
 25: volume HA-WIN-TT-1T-io-threads
 26:     type performance/io-threads
 27:     subvolumes HA-WIN-TT-1T-locks
 28: end-volume
 29:
 30: volume HA-WIN-TT-1T-index
 31:     type features/index
 32:     option index-base /exports/NFS-WIN/1T/.glusterfs/indices
 33:     subvolumes HA-WIN-TT-1T-io-threads
 34: end-volume
 35:
 36: volume HA-WIN-TT-1T-marker
 37:     type features/marker
 38:     option volume-uuid 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
 39:     option timestamp-file /var/lib/glusterd/vols/HA-WIN-TT-1T/marker.tstamp
 40:     option xtime off
 41:     option gsync-force-xtime off
 42:     option quota off
 43:     subvolumes HA-WIN-TT-1T-index
 44: end-volume
 45:
 46: volume HA-WIN-TT-1T-quota
 47:     type features/quota
 48:     option volume-uuid HA-WIN-TT-1T
 49:     option server-quota off
 50:     option timeout 0
 51:     option deem-statfs off
 52:     subvolumes HA-WIN-TT-1T-marker
 53: end-volume
 54:
 55: volume /exports/NFS-WIN/1T
 56:     type debug/io-stats
 57:     option latency-measurement off
 58:     option count-fop-hits off
 59:     subvolumes HA-WIN-TT-1T-quota
 60: end-volume
 61:
 62: volume HA-WIN-TT-1T-server
 63:     type protocol/server
 64:     option transport.socket.listen-port 49160
 65:     option rpc-auth.auth-glusterfs on
 66:     option rpc-auth.auth-unix on
 67:     option rpc-auth.auth-null on
 68:     option transport-type tcp
 69:     option auth.login./exports/NFS-WIN/1T.allow 101b907c-ff21-47da-8ba6-37e2920691ce
 70:     option auth.login.101b907c-ff21-47da-8ba6-37e2920691ce.password f4f29094-891f-4241-8736-5e3302ed8bc8
 71:     option auth.addr./exports/NFS-WIN/1T.allow *
 72:     subvolumes /exports/NFS-WIN/1T
 73: end-volume
 74:
+------------------------------------------------------------------------------+
[2014-10-13 17:38:27.985048] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from stor1-14362-2014/10/13-17:38:26:938194-HA-WIN-TT-1T-client-1-0-0 (version: 3.5.2)
[2014-10-13 17:38:28.988700] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-1-0-1 (version: 3.5.2)
[2014-10-13 17:38:29.004121] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from stor2-15494-2014/10/13-17:38:28:989227-HA-WIN-TT-1T-client-1-0-0 (version: 3.5.2)
[2014-10-13 17:38:38.515315] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from glstor-cli-23823-2014/10/13-17:37:54:595571-HA-WIN-TT-1T-client-1-0-0 (version: 3.5.2)
[2014-10-13 17:39:09.872223] I [server.c:520:server_rpc_notify] 0-HA-WIN-TT-1T-server: disconnecting connectionfrom glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-1-0-1
[2014-10-13 17:39:09.872299] I [client_t.c:417:gf_client_unref] 0-HA-WIN-TT-1T-server: Shutting down connection glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-1-0-1
[2014-10-13 17:41:05.427810] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.443234] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.445049] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
root@stor2:~#

brick stor1

[2014-10-13 17:38:24.900066] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.5.2 (/usr/sbin/glusterfsd -s stor1 --volfile-id HA-WIN-TT-1T.stor1.exports-NFS-WIN-1T -p /var/lib/glusterd/vols/HA-WIN-TT-1T/run/stor1-exports-NFS-WIN-1T.pid -S /var/run/02580c93278849804f3f34f7ed8314b2.socket --brick-name /exports/NFS-WIN/1T -l /var/log/glusterfs/bricks/exports-NFS-WIN-1T.log --xlator-option *-posix.glusterd-uuid=e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3 --brick-port 49160 --xlator-option HA-WIN-TT-1T-server.listen-port=49160)
[2014-10-13 17:38:24.902022] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2014-10-13 17:38:24.902077] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
[2014-10-13 17:38:24.902214] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-10-13 17:38:24.902239] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-10-13 17:38:24.906698] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-server: adding option 'listen-port' for volume 'HA-WIN-TT-1T-server' with value '49160'
[2014-10-13 17:38:24.906731] I [graph.c:254:gf_add_cmdline_options] 0-HA-WIN-TT-1T-posix: adding option 'glusterd-uuid' for volume 'HA-WIN-TT-1T-posix' with value 'e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3'
[2014-10-13 17:38:24.908378] I [rpcsvc.c:2127:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2014-10-13 17:38:24.908435] W [options.c:888:xl_opt_validate] 0-HA-WIN-TT-1T-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2014-10-13 17:38:24.908472] I [socket.c:3561:socket_init] 0-tcp.HA-WIN-TT-1T-server: SSL support is NOT enabled
[2014-10-13 17:38:24.908485] I [socket.c:3576:socket_init] 0-tcp.HA-WIN-TT-1T-server: using system polling thread
[2014-10-13 17:38:24.909105] W [graph.c:329:_log_if_unknown_option] 0-HA-WIN-TT-1T-quota: option 'timeout' is not recognized
Final graph:
+------------------------------------------------------------------------------+
  1: volume HA-WIN-TT-1T-posix
  2:     type storage/posix
  3:     option glusterd-uuid e09cbbc2-08a3-4e5b-83b8-48eb11a1c7b3
  4:     option directory /exports/NFS-WIN/1T
  5:     option volume-id 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
  6: end-volume
  7:
  8: volume HA-WIN-TT-1T-changelog
  9:     type features/changelog
 10:     option changelog-brick /exports/NFS-WIN/1T
 11:     option changelog-dir /exports/NFS-WIN/1T/.glusterfs/changelogs
 12:     subvolumes HA-WIN-TT-1T-posix
 13: end-volume
 14:
 15: volume HA-WIN-TT-1T-access-control
 16:     type features/access-control
 17:     subvolumes HA-WIN-TT-1T-changelog
 18: end-volume
 19:
 20: volume HA-WIN-TT-1T-locks
 21:     type features/locks
 22:     subvolumes HA-WIN-TT-1T-access-control
 23: end-volume
 24:
 25: volume HA-WIN-TT-1T-io-threads
 26:     type performance/io-threads
 27:     subvolumes HA-WIN-TT-1T-locks
 28: end-volume
 29:
 30: volume HA-WIN-TT-1T-index
 31:     type features/index
 32:     option index-base /exports/NFS-WIN/1T/.glusterfs/indices
 33:     subvolumes HA-WIN-TT-1T-io-threads
 34: end-volume
 35:
 36: volume HA-WIN-TT-1T-marker
 37:     type features/marker
 38:     option volume-uuid 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
 39:     option timestamp-file /var/lib/glusterd/vols/HA-WIN-TT-1T/marker.tstamp
 40:     option xtime off
 41:     option gsync-force-xtime off
 42:     option quota off
 43:     subvolumes HA-WIN-TT-1T-index
 44: end-volume
 45:
 46: volume HA-WIN-TT-1T-quota
 47:     type features/quota
 48:     option volume-uuid HA-WIN-TT-1T
 49:     option server-quota off
 50:     option timeout 0
 51:     option deem-statfs off
 52:     subvolumes HA-WIN-TT-1T-marker
 53: end-volume
 54:
 55: volume /exports/NFS-WIN/1T
 56:     type debug/io-stats
 57:     option latency-measurement off
 58:     option count-fop-hits off
 59:     subvolumes HA-WIN-TT-1T-quota
 60: end-volume
 61:
 62: volume HA-WIN-TT-1T-server
 63:     type protocol/server
 64:     option transport.socket.listen-port 49160
 65:     option rpc-auth.auth-glusterfs on
 66:     option rpc-auth.auth-unix on
 67:     option rpc-auth.auth-null on
 68:     option transport-type tcp
 69:     option auth.login./exports/NFS-WIN/1T.allow 101b907c-ff21-47da-8ba6-37e2920691ce
 70:     option auth.login.101b907c-ff21-47da-8ba6-37e2920691ce.password f4f29094-891f-4241-8736-5e3302ed8bc8
 71:     option auth.addr./exports/NFS-WIN/1T.allow *
 72:     subvolumes /exports/NFS-WIN/1T
 73: end-volume
 74:
+------------------------------------------------------------------------------+
[2014-10-13 17:38:25.933796] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-0-0-1 (version: 3.5.2)
[2014-10-13 17:38:26.954924] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from stor1-14362-2014/10/13-17:38:26:938194-HA-WIN-TT-1T-client-0-0-0 (version: 3.5.2)
[2014-10-13 17:38:28.991488] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from stor2-15494-2014/10/13-17:38:28:989227-HA-WIN-TT-1T-client-0-0-0 (version: 3.5.2)
[2014-10-13 17:38:38.502056] I [server-handshake.c:575:server_setvolume] 0-HA-WIN-TT-1T-server: accepted client from glstor-cli-23823-2014/10/13-17:37:54:595571-HA-WIN-TT-1T-client-0-0-0 (version: 3.5.2)
[2014-10-13 17:39:09.858784] I [server.c:520:server_rpc_notify] 0-HA-WIN-TT-1T-server: disconnecting connectionfrom glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-0-0-1
[2014-10-13 17:39:09.858863] I [client_t.c:417:gf_client_unref] 0-HA-WIN-TT-1T-server: Shutting down connection glstor-cli-20753-2014/10/13-11:50:40:959211-HA-WIN-TT-1T-client-0-0-1
[2014-10-13 17:41:05.390918] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.408236] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-10-13 17:41:05.414813] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing


seems to be the right part of logs :)


2014-10-15 18:24 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 10/14/2014 01:20 AM, Roman wrote:
ok. done.
this time there were no disconnects, at least all of vms are working, but got some mails from VM about IO writes again.

WARNINGs: Read IO Wait time is 1.45 (outside range [0:1]).
This warning says 'Read IO wait' and there is not a single READ operation that came to gluster. Wondering why that is :-/. Any clue? There is at least one write which took 3 seconds according to the stats. At least one synchronization operation (FINODELK) took 23 seconds. Could you give logs of this run? for  mount, glustershd, bricks.

Pranith


here is the output

root@stor1:~# gluster volume profile HA-WIN-TT-1T info
Brick: stor1:/exports/NFS-WIN/1T
--------------------------------
Cumulative Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             25     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             16  RELEASEDIR
      0.00      64.00 us      52.00 us      76.00 us              2     ENTRYLK
      0.00      73.50 us      51.00 us      96.00 us              2       FLUSH
      0.00      68.43 us      30.00 us     135.00 us              7      STATFS
      0.00      54.31 us      44.00 us     109.00 us             16     OPENDIR
      0.00      50.75 us      16.00 us      74.00 us             24       FSTAT
      0.00      47.77 us      19.00 us     119.00 us             26    GETXATTR
      0.00      59.21 us      21.00 us      89.00 us             24        OPEN
      0.00      59.39 us      22.00 us     296.00 us             28     READDIR
      0.00    4972.00 us    4972.00 us    4972.00 us              1      CREATE
      0.00      97.42 us      19.00 us     184.00 us             62      LOOKUP
      0.00      89.49 us      20.00 us     656.00 us            324    FXATTROP
      3.91 1255944.81 us     127.00 us 23397532.00 us            189       FSYNC
      7.40 3406275.50 us      17.00 us 23398013.00 us            132     INODELK
     34.96   94598.02 us       8.00 us 23398705.00 us          22445    FINODELK
     53.73     442.66 us      79.00 us 3116494.00 us        7372799       WRITE

    Duration: 7813 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Interval 0 Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             25     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             16  RELEASEDIR
      0.00      64.00 us      52.00 us      76.00 us              2     ENTRYLK
      0.00      73.50 us      51.00 us      96.00 us              2       FLUSH
      0.00      68.43 us      30.00 us     135.00 us              7      STATFS
      0.00      54.31 us      44.00 us     109.00 us             16     OPENDIR
      0.00      50.75 us      16.00 us      74.00 us             24       FSTAT
      0.00      47.77 us      19.00 us     119.00 us             26    GETXATTR
      0.00      59.21 us      21.00 us      89.00 us             24        OPEN
      0.00      59.39 us      22.00 us     296.00 us             28     READDIR
      0.00    4972.00 us    4972.00 us    4972.00 us              1      CREATE
      0.00      97.42 us      19.00 us     184.00 us             62      LOOKUP
      0.00      89.49 us      20.00 us     656.00 us            324    FXATTROP
      3.91 1255944.81 us     127.00 us 23397532.00 us            189       FSYNC
      7.40 3406275.50 us      17.00 us 23398013.00 us            132     INODELK
     34.96   94598.02 us       8.00 us 23398705.00 us          22445    FINODELK
     53.73     442.66 us      79.00 us 3116494.00 us        7372799       WRITE

    Duration: 7813 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Brick: stor2:/exports/NFS-WIN/1T
--------------------------------
Cumulative Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             25     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             16  RELEASEDIR
      0.00      61.50 us      46.00 us      77.00 us              2     ENTRYLK
      0.00      82.00 us      67.00 us      97.00 us              2       FLUSH
      0.00     265.00 us     265.00 us     265.00 us              1      CREATE
      0.00      57.43 us      30.00 us      85.00 us              7      STATFS
      0.00      61.12 us      37.00 us     107.00 us             16     OPENDIR
      0.00      44.04 us      12.00 us      86.00 us             24       FSTAT
      0.00      41.42 us      24.00 us      96.00 us             26    GETXATTR
      0.00      45.93 us      24.00 us     133.00 us             28     READDIR
      0.00      57.17 us      25.00 us     147.00 us             24        OPEN
      0.00     145.28 us      31.00 us     288.00 us             32    READDIRP
      0.00      39.50 us      10.00 us     152.00 us            132     INODELK
      0.00     330.97 us      20.00 us   14280.00 us             62      LOOKUP
      0.00      79.06 us      19.00 us     851.00 us            430    FXATTROP
      0.02      29.32 us       7.00 us   28154.00 us          22568    FINODELK
      7.80 1313096.68 us     125.00 us 23281862.00 us            189       FSYNC
     92.18     397.92 us      76.00 us 1838343.00 us        7372799       WRITE

    Duration: 7811 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

Interval 0 Stats:
   Block Size:             131072b+              262144b+
 No. of Reads:                    0                     0
No. of Writes:              7372798                     1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             25     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             16  RELEASEDIR
      0.00      61.50 us      46.00 us      77.00 us              2     ENTRYLK
      0.00      82.00 us      67.00 us      97.00 us              2       FLUSH
      0.00     265.00 us     265.00 us     265.00 us              1      CREATE
      0.00      57.43 us      30.00 us      85.00 us              7      STATFS
      0.00      61.12 us      37.00 us     107.00 us             16     OPENDIR
      0.00      44.04 us      12.00 us      86.00 us             24       FSTAT
      0.00      41.42 us      24.00 us      96.00 us             26    GETXATTR
      0.00      45.93 us      24.00 us     133.00 us             28     READDIR
      0.00      57.17 us      25.00 us     147.00 us             24        OPEN
      0.00     145.28 us      31.00 us     288.00 us             32    READDIRP
      0.00      39.50 us      10.00 us     152.00 us            132     INODELK
      0.00     330.97 us      20.00 us   14280.00 us             62      LOOKUP
      0.00      79.06 us      19.00 us     851.00 us            430    FXATTROP
      0.02      29.32 us       7.00 us   28154.00 us          22568    FINODELK
      7.80 1313096.68 us     125.00 us 23281862.00 us            189       FSYNC
     92.18     397.92 us      76.00 us 1838343.00 us        7372799       WRITE

    Duration: 7811 seconds
   Data Read: 0 bytes
Data Written: 966367641600 bytes

does it make something more clear?

2014-10-13 20:40 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
i think i may know what was an issue. There was an iscsitarget service runing, that was exporting this generated block device. so maybe my collegue Windows server picked it up and mountd :) I'll if it will happen again.

2014-10-13 20:27 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
So may I restart the volume and start the test, or you need something else from this issue?

2014-10-13 19:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 10/13/2014 10:03 PM, Roman wrote:
hmm,
seems like another strange issue? Seen this before. Had to restart the volume to get my empty space back.
root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
total 943718400
-rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
root@glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                  282G  1.1G  266G   1% /
udev                                                     10M     0   10M   0% /dev
tmpfs                                                   1.4G  228K  1.4G   1% /run
/dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba  282G  1.1G  266G   1% /
tmpfs                                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                                   5.2G     0  5.2G   0% /run/shm
stor1:HA-WIN-TT-1T                                     1008G  901G   57G  95% /srv/nfs/HA-WIN-TT-1T

no file, but size is still 901G.
Both servers show the same.
Do I really have to restart the volume to fix that?
IMO this can happen if there is an fd leak. open-fd is the only variable that can change with volume restart. How do you re-create the bug?

Pranith


2014-10-13 19:30 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
Sure.
I'll let it to run for this night .

2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
hi Roman,
     Do you think we can run this test again? this time, could you enable 'gluster volume profile <volname> start', do the same test. Provide output of 'gluster volume profile <volname> info' and logs after the test?

Pranith

On 10/13/2014 09:45 PM, Roman wrote:
Sure !

root@stor1:~# gluster volume info

Volume Name: HA-2TB-TT-Proxmox-cluster
Type: Replicate
Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
Options Reconfigured:
nfs.disable: 0
network.ping-timeout: 10

Volume Name: HA-WIN-TT-1T
Type: Replicate
Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: stor1:/exports/NFS-WIN/1T
Brick2: stor2:/exports/NFS-WIN/1T
Options Reconfigured:
nfs.disable: 1
network.ping-timeout: 10



2014-10-13 19:09 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
Could you give your 'gluster volume info' output?

Pranith

On 10/13/2014 09:36 PM, Roman wrote:
Hi,

I've got this kind of setup (servers run replica)


@ 10G backend
gluster storage1
gluster storage2
gluster client1

@1g backend
other gluster clients

Servers got HW RAID5 with SAS disks.

So today I've desided to create a 900GB file for iscsi target that will be located @ glusterfs separate volume, using dd (just a dummy file filled with zeros, bs=1G count 900)
For the first of all the process took pretty lots of time, the writing speed was 130 MB/sec (client port was 2 gbps, servers ports were running @ 1gbps).
Then it reported something like "endpoint is not connected" and all of my VMs on the other volume started to give me IO errors.
Servers load was around 4,6 (total 12 cores)

Maybe it was due to timeout of 2 secs, so I've made it a big higher, 10 sec.

Also during the dd image creation time, VMs very often reported me that their disks are slow like

WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).

Is 130MB /sec is the maximum bandwidth for all of the volumes in total? That why would we need 10g backends?

HW Raid local speed is 300 MB/sec, so it should not be an issue. any ideas or mby any advices?


Maybe some1 got optimized sysctl.conf for 10G backend?

mine is pretty simple, which can be found from googling.


just to mention: those VM-s were connected using separate 1gbps intraface, which means, they should not be affected by the client with 10g backend.


logs are pretty useless, they just say  this during the outage


[2014-10-13 12:09:18.392910] W [client-handshake.c:276:client_ping_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired

[2014-10-13 12:10:08.389708] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 has not responded in the last 2 seconds, disconnecting.

[2014-10-13 12:10:08.390312] W [client-handshake.c:276:client_ping_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired

so I decided to set the timout a bit higher.

So it seems to me, that under high load GlusterFS is not useable? 130 MB/s is not that much to get some kind of timeouts or makeing the systme so slow, that VM-s feeling themselves bad.

Of course, after the disconnection, healing process was started, but as VM-s lost connection to both of servers, it was pretty useless, they could not run anymore. and BTW, when u load the server with such huge job (dd of 900GB), healing process goes soooooo slow :)



--
Best regards,
Roman.


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Best regards,
Roman.




--
Best regards,
Roman.



--
Best regards,
Roman.




--
Best regards,
Roman.



--
Best regards,
Roman.



--
Best regards,
Roman.




--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux