I'm trying to create a new Geo-Rep of about 3 TB of data currently stored in a 2 brick mirror config. Obviously the geo-rep destination is a third server. This is my 150th attempt. Okay, maybe not that far, but it's pretty darn bad. Replication works fine until I hit around 1TB of data sync'd, then it just stalls. For the past two days it hasn't gone past 1050156672 bytes sync'd to the destination server. I did some digging in the logs and it looks like the brick that's running the geo-rep process thinks it's syncing: [2013-09-05 09:45:37.354831] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000863.enc ... [2013-09-05 09:45:37.358669] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/0000083b.enc ... [2013-09-05 09:45:37.362251] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/0000087b.enc ... [2013-09-05 09:45:37.366027] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000834.enc ... [2013-09-05 09:45:37.369752] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000845.enc ... [2013-09-05 09:45:37.373528] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000864.enc ... [2013-09-05 09:45:37.377037] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/0000087f.enc ... [2013-09-05 09:45:37.391432] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000897.enc ... [2013-09-05 09:45:37.395059] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000829.enc ... [2013-09-05 09:45:37.398725] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/00000816.enc ... [2013-09-05 09:45:37.402559] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/000008cc.enc ... [2013-09-05 09:45:37.406450] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/000008d2.enc ... [2013-09-05 09:45:37.410310] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/000008df.enc ... [2013-09-05 09:45:37.414344] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/00/00/08/000008bd.enc ... [2013-09-05 09:45:37.438173] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/volume.info ... [2013-09-05 09:45:37.441675] D [master:386:crawl] GMaster: syncing ./evds3/Sky_Main_66/volume.enc ... But, *those files never appear on the destination server,* however the containing folders are there, just empty. Also, the other log file (...gluster.log) in the geo-replication log folder that matches the destination stopped updating when the syncing stopped apparently. It's last timestamp is from the 2nd, which is the last time data transferred. The last bit from that log file is as such: +------------------------------------------------------------------------------+ [2013-09-02 06:37:50.109730] I [rpc-clnt.c:1654:rpc_clnt_reconfig] 0-docstore1-client-1: changing port to 24009 (from 0) [2013-09-02 06:37:50.109857] I [rpc-clnt.c:1654:rpc_clnt_reconfig] 0-docstore1-client-0: changing port to 24009 (from 0) [2013-09-02 06:37:54.097468] I [client-handshake.c:1614:select_server_supported_programs] 0-docstore1-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-09-02 06:37:54.097973] I [client-handshake.c:1411:client_setvolume_cbk] 0-docstore1-client-1: Connected to 10.200.1.6:24009, attached to remote volume '/data/docstore1'. [2013-09-02 06:37:54.098005] I [client-handshake.c:1423:client_setvolume_cbk] 0-docstore1-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-09-02 06:37:54.098094] I [afr-common.c:3685:afr_notify] 0-docstore1-replicate-0: Subvolume 'docstore1-client-1' came back up; going online. [2013-09-02 06:37:54.098274] I [client-handshake.c:453:client_set_lk_version_cbk] 0-docstore1-client-1: Server lk version = 1 [2013-09-02 06:37:54.098619] I [client-handshake.c:1614:select_server_supported_programs] 0-docstore1-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-09-02 06:37:54.099191] I [client-handshake.c:1411:client_setvolume_cbk] 0-docstore1-client-0: Connected to 10.200.1.5:24009, attached to remote volume '/data/docstore1'. [2013-09-02 06:37:54.099222] I [client-handshake.c:1423:client_setvolume_cbk] 0-docstore1-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-09-02 06:37:54.105891] I [fuse-bridge.c:4191:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-09-02 06:37:54.106039] I [client-handshake.c:453:client_set_lk_version_cbk] 0-docstore1-client-0: Server lk version = 1 [2013-09-02 06:37:54.106179] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.17 [2013-09-02 06:37:54.108766] I [afr-common.c:2022:afr_set_root_inode_on_first_lookup] 0-docstore1-replicate-0: added root inode This is driving me nuts - I've been working on getting Geo-Replication working for over 2 months now without any success. Status on the geo-rep shows OK: root at gfs6:~# gluster volume geo-replication docstore1 ssh://root at backup-ds2.gluster:/data/docstore1 status MASTER SLAVE STATUS -------------------------------------------------------------------------------- docstore1 ssh://root at backup-ds2.gluster:/data/docstore1 OK root at gfs6:~# Here's the config: root at gfs6:~# gluster volume geo-replication docstore1 ssh://root at backup-ds2.gluster:/data/docstore1 config log_level: DEBUG gluster_log_file: /var/log/glusterfs/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem session_owner: 24f8c92d-723e-4513-9593-40ef4b7e766a remote_gsyncd: /usr/lib/glusterfs/glusterfs/gsyncd state_file: /var/lib/glusterd/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.status gluster_command_dir: /usr/sbin/ pid_file: /var/lib/glusterd/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.pid log_file: /var/log/glusterfs/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.log gluster_params: xlator-option=*-dht.assert-no-child-down=true root at gfs6:~# I'm running Ubuntu packages 3.3.2-ubuntu1-precise2 from the ppa. Any ideas for why it's stalling? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130905/ed47b6f4/attachment.html>