Here is my test results on my CentOS 5.4 test environment. Configuration has been generated by glusterfs-volgen as you can see in the log, server configuration hasn't been changed, client configuration has been cut as you can also see. The dataset is quite small (<1G), mostly small files (pictures, css filest) I've used 3 nodes (2xserver, 1xclient), all physical servers, no virtualization. I've set up the system using glusterfs-*-3.0.2-1 RPM packages from glusterfs website, than stopped one of the servers, copy over some files to the glusterfs mount, than start the server. I expect that a background process will re-sync the content between the two server nodes. This hasn't happened (check the timestamps in the log below). I'm not sure if I misunderstand something and there's no such background, automatic self-healing service or I need to use special fuse modules to make it work or I hit a bug in glusterfs. Thanks, miloska * Startup: # glusterfs --volfile=/etc/glusterfs/glusterfs.vol --volfile-check --debug /gtest/ [2010-02-10 10:45:19] D [glusterfsd.c:424:_get_specfp] glusterfs: loading volume file /etc/glusterfs/glusterfs.vol ================================================================================ Version : glusterfs 3.0.2 built on Feb 7 2010 00:15:44 git: v3.0.2 Starting Time: 2010-02-10 10:45:19 Command line : glusterfs --volfile=/etc/glusterfs/glusterfs.vol --volfile-check --debug /gtest/ PID : 5770 System name : Linux Nodename : phySERVER-177 Kernel Release : 2.6.18-164.el5 Hardware Identifier: x86_64 Given volfile: +------------------------------------------------------------------------------+ 1: ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol) 2: # Cmd line: 3: # $ /usr/bin/glusterfs-volgen -name test1 --raid 1 XX.YY.ZZ.179:/data XX.YY.ZZ.181:/data 4: 5: # RAID 1 6: # TRANSPORT-TYPE tcp 7: volume XX.YY.ZZ.179-1 8: type protocol/client 9: option transport-type tcp 10: option remote-host XX.YY.ZZ.179 11: option transport.socket.nodelay on 12: option transport.remote-port 6996 13: option remote-subvolume brick1 14: end-volume 15: 16: volume XX.YY.ZZ.181-1 17: type protocol/client 18: option transport-type tcp 19: option remote-host XX.YY.ZZ.181 20: option transport.socket.nodelay on 21: option transport.remote-port 6996 22: option remote-subvolume brick1 23: end-volume 24: 25: volume mirror-0 26: type cluster/replicate 27: subvolumes XX.YY.ZZ.179-1 XX.YY.ZZ.181-1 28: end-volume 29: +------------------------------------------------------------------------------+ [2010-02-10 10:45:19] D [glusterfsd.c:1370:main] glusterfs: running in pid 5770 [2010-02-10 10:45:19] D [client-protocol.c:6585:init] XX.YY.ZZ.181-1: defaulting frame-timeout to 30mins [2010-02-10 10:45:19] D [client-protocol.c:6596:init] XX.YY.ZZ.181-1: defaulting ping-timeout to 42 [2010-02-10 10:45:19] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/3.0.2/transport/socket.so [2010-02-10 10:45:19] W [xlator.c:655:validate_xlator_volume_options] XX.YY.ZZ.181-1: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-02-10 10:45:19] D [xlator.c:284:_volume_option_value_validate] XX.YY.ZZ.181-1: no range check required for 'option remote-port 6996' [2010-02-10 10:45:19] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/3.0.2/transport/socket.so [2010-02-10 10:45:19] D [xlator.c:284:_volume_option_value_validate] XX.YY.ZZ.181-1: no range check required for 'option remote-port 6996' [2010-02-10 10:45:19] D [client-protocol.c:6585:init] XX.YY.ZZ.179-1: defaulting frame-timeout to 30mins [2010-02-10 10:45:19] D [client-protocol.c:6596:init] XX.YY.ZZ.179-1: defaulting ping-timeout to 42 [2010-02-10 10:45:19] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/3.0.2/transport/socket.so [2010-02-10 10:45:19] W [xlator.c:655:validate_xlator_volume_options] XX.YY.ZZ.179-1: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-02-10 10:45:19] D [xlator.c:284:_volume_option_value_validate] XX.YY.ZZ.179-1: no range check required for 'option remote-port 6996' [2010-02-10 10:45:19] D [transport.c:145:transport_load] transport: attempt to load file /usr/lib64/glusterfs/3.0.2/transport/socket.so [2010-02-10 10:45:19] D [xlator.c:284:_volume_option_value_validate] XX.YY.ZZ.179-1: no range check required for 'option remote-port 6996' [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.179-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.179-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.181-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.181-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.179-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.179-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.181-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] D [client-protocol.c:7009:notify] XX.YY.ZZ.181-1: got GF_EVENT_PARENT_UP, attempting connect on transport [2010-02-10 10:45:19] N [glusterfsd.c:1396:main] glusterfs: Successfully started [2010-02-10 10:45:19] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.181-1: connection to failed (Network is unreachable) [2010-02-10 10:45:19] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.181-1: connection to failed (Network is unreachable) [2010-02-10 10:45:19] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.179-1: connection to failed (Network is unreachable) [2010-02-10 10:45:19] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.179-1: connection to failed (Network is unreachable) [2010-02-10 10:45:19] D [fuse-bridge.c:3092:fuse_thread_proc] fuse: pthread_cond_timedout returned non zero value ret: 0 errno: 0 [2010-02-10 10:45:19] N [fuse-bridge.c:2942:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10 [2010-02-10 10:45:30] D [client-protocol.c:7023:notify] XX.YY.ZZ.179-1: got GF_EVENT_CHILD_UP [2010-02-10 10:45:30] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.179-1: Connected to XX.YY.ZZ.179:6996, attached to remote volume 'brick1'. [2010-02-10 10:45:30] N [afr.c:2625:notify] mirror-0: Subvolume 'XX.YY.ZZ.179-1' came back up; going online. [2010-02-10 10:45:30] D [client-protocol.c:7023:notify] XX.YY.ZZ.179-1: got GF_EVENT_CHILD_UP [2010-02-10 10:45:30] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.179-1: Connected to XX.YY.ZZ.179:6996, attached to remote volume 'brick1'. [2010-02-10 10:45:30] N [afr.c:2625:notify] mirror-0: Subvolume 'XX.YY.ZZ.179-1' came back up; going online. [2010-02-10 10:45:30] D [client-protocol.c:7023:notify] XX.YY.ZZ.181-1: got GF_EVENT_CHILD_UP [2010-02-10 10:45:30] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.181-1: Connected to XX.YY.ZZ.181:6996, attached to remote volume 'brick1'. [2010-02-10 10:45:30] D [client-protocol.c:7023:notify] XX.YY.ZZ.181-1: got GF_EVENT_CHILD_UP [2010-02-10 10:45:30] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.181-1: Connected to XX.YY.ZZ.181:6996, attached to remote volume 'brick1'. * Shut down server 181 [2010-02-10 10:47:17] N [client-protocol.c:6976:notify] XX.YY.ZZ.181-1: disconnected [2010-02-10 10:47:17] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.181-1: connection to XX.YY.ZZ.181:6996 failed (Connection refused) [2010-02-10 10:47:17] E [socket.c:760:socket_connect_finish] XX.YY.ZZ.181-1: connection to XX.YY.ZZ.181:6996 failed (Connection refused) * Start up server 181 [2010-02-10 10:47:35] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:35] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:35] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:35] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284016 (/3static/5051): failed to get remote inode number [2010-02-10 10:47:35] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/5051 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:35] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284072 (/3static/5051/7471): failed to get remote inode number [2010-02-10 10:47:35] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/5051/7471 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:36] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:36] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:36] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:36] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284016 (/3static/5051): failed to get remote inode number [2010-02-10 10:47:36] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/5051 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:36] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284072 (/3static/5051/7471): failed to get remote inode number [2010-02-10 10:47:36] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/5051/7471 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:37] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:37] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:37] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:38] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:38] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:38] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:38] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284660 (/3static/16331): failed to get remote inode number [2010-02-10 10:47:38] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/16331 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:38] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284662 (/3static/16331/30351): failed to get remote inode number [2010-02-10 10:47:38] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/16331/30351 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:38] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284696 (/3static/16331/30351/361491): failed to get remote inode number [2010-02-10 10:47:38] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/16331/30351/361491 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:39] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:39] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:39] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:39] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107284660 (/3static/16331): failed to get remote inode number [2010-02-10 10:47:39] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/16331 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:40] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:40] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:40] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:41] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of / on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:41] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107283318 (/3static): failed to get remote inode number [2010-02-10 10:47:41] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:41] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107348204 (/3static/2501): failed to get remote inode number [2010-02-10 10:47:41] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/2501 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:41] D [client-protocol.c:3143:client_entrylk] XX.YY.ZZ.181-1: ENTRYLK 107348206 (/3static/2501/3271): failed to get remote inode number [2010-02-10 10:47:41] D [afr-self-heal-entry.c:108:afr_sh_entry_unlck_cbk] mirror-0: unlocking inode of /3static/2501/3271 on child 1 failed: Transport endpoint is not connected [2010-02-10 10:47:57] D [client-protocol.c:7023:notify] XX.YY.ZZ.181-1: got GF_EVENT_CHILD_UP [2010-02-10 10:47:57] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.181-1: Connected to XX.YY.ZZ.181:6996, attached to remote volume 'brick1'. [2010-02-10 10:47:57] D [client-protocol.c:7023:notify] XX.YY.ZZ.181-1: got GF_EVENT_CHILD_UP [2010-02-10 10:47:57] N [client-protocol.c:6228:client_setvolume_cbk] XX.YY.ZZ.181-1: Connected to XX.YY.ZZ.181:6996, attached to remote volume 'brick1'. * Checking the content of the directory [2010-02-10 10:49:16] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory / from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 10:49:16] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static on XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 10:49:17] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/5051 on XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/10181 on XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/9301 on XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/16331 on XX.YY.ZZ.181-1 [2010-02-10 10:49:17] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/2501 on XX.YY.ZZ.181-1 * After half an hour no automatic background sync, just if I use the files (ls -l) [2010-02-10 11:17:37] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static/10181 from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static/10181 from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/10181/16611 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static/5051 from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static/5051 from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/5051/24191 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/5051/7471 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static/9301 from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static/9301 from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/9301/17461 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/9301/15291 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static/16331 from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static/16331 from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/16331/30931 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/16331/30351 on XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-metadata.c:404:afr_sh_metadata_sync] mirror-0: self-healing metadata of /3static/2501 from XX.YY.ZZ.179-1 to XX.YY.ZZ.181-1 [2010-02-10 11:17:37] D [afr-self-heal-entry.c:2293:afr_sh_entry_sync_prepare] mirror-0: self-healing directory /3static/2501 from subvolume XX.YY.ZZ.179-1 to 1 other [2010-02-10 11:17:37] D [afr-self-heal-entry.c:1427:afr_sh_entry_impunge_mkdir] mirror-0: creating missing directory /3static/2501/3271 on XX.YY.ZZ.181-1