Looks like the slave stopped logging things just before the sending brick did: [2013-09-01 14:56:45.13972] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-01 14:56:45.15433] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-01 15:57:34.592938] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-01 15:57:34.593383] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-01 15:57:45.374301] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-01 15:57:45.375871] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-01 16:02:09.115976] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-01 16:02:09.116446] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-01 16:02:19.869340] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-01 16:02:19.870598] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-01 16:32:58.701617] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-01 16:32:58.702126] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-01 16:33:09.456556] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-01 16:33:09.458009] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-01 21:47:13.442514] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-01 21:47:13.442899] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-01 21:47:24.240978] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-01 21:47:24.242424] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-02 02:12:14.26339] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-02 02:12:14.26809] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-02 02:12:24.818355] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-02 02:12:24.820008] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-02 02:16:14.525187] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-02 02:16:14.525675] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-02 02:16:25.263712] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-02 02:16:25.265168] I [resource(slave):453:service_loop] FILE: slave listening [2013-09-02 02:37:39.315608] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-09-02 02:37:39.316071] I [syncdutils(slave):142:finalize] <top>: exiting. [2013-09-02 02:37:50.78136] I [gsyncd(slave):354:main_i] <top>: syncing: file:///data/docstore1 [2013-09-02 02:37:50.79577] I [resource(slave):453:service_loop] FILE: slave listening On Thu, Sep 5, 2013 at 10:09 AM, Venky Shankar <yknev.shankar at gmail.com>wrote: > Could you also provide the slave logs? (log location on the > slave: /var/log/glusterfs/geo-replication-slaves) > > Thanks, > -venky > > > On Thu, Sep 5, 2013 at 7:29 PM, Tony Maro <tonym at evrichart.com> wrote: > >> I'm trying to create a new Geo-Rep of about 3 TB of data currently stored >> in a 2 brick mirror config. Obviously the geo-rep destination is a third >> server. >> >> This is my 150th attempt. Okay, maybe not that far, but it's pretty darn >> bad. >> >> Replication works fine until I hit around 1TB of data sync'd, then it >> just stalls. For the past two days it hasn't gone past 1050156672 bytes >> sync'd to the destination server. >> >> I did some digging in the logs and it looks like the brick that's running >> the geo-rep process thinks it's syncing: >> >> [2013-09-05 09:45:37.354831] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000863.enc ... >> [2013-09-05 09:45:37.358669] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/0000083b.enc ... >> [2013-09-05 09:45:37.362251] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/0000087b.enc ... >> [2013-09-05 09:45:37.366027] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000834.enc ... >> [2013-09-05 09:45:37.369752] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000845.enc ... >> [2013-09-05 09:45:37.373528] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000864.enc ... >> [2013-09-05 09:45:37.377037] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/0000087f.enc ... >> [2013-09-05 09:45:37.391432] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000897.enc ... >> [2013-09-05 09:45:37.395059] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000829.enc ... >> [2013-09-05 09:45:37.398725] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/00000816.enc ... >> [2013-09-05 09:45:37.402559] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/000008cc.enc ... >> [2013-09-05 09:45:37.406450] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/000008d2.enc ... >> [2013-09-05 09:45:37.410310] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/000008df.enc ... >> [2013-09-05 09:45:37.414344] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/00/00/08/000008bd.enc ... >> [2013-09-05 09:45:37.438173] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/volume.info ... >> [2013-09-05 09:45:37.441675] D [master:386:crawl] GMaster: syncing >> ./evds3/Sky_Main_66/volume.enc ... >> >> But, *those files never appear on the destination server,* however the >> containing folders are there, just empty. >> >> Also, the other log file (...gluster.log) in the geo-replication log >> folder that matches the destination stopped updating when the syncing >> stopped apparently. It's last timestamp is from the 2nd, which is the last >> time data transferred. >> >> The last bit from that log file is as such: >> >> >> +------------------------------------------------------------------------------+ >> [2013-09-02 06:37:50.109730] I [rpc-clnt.c:1654:rpc_clnt_reconfig] >> 0-docstore1-client-1: changing port to 24009 (from 0) >> [2013-09-02 06:37:50.109857] I [rpc-clnt.c:1654:rpc_clnt_reconfig] >> 0-docstore1-client-0: changing port to 24009 (from 0) >> [2013-09-02 06:37:54.097468] I >> [client-handshake.c:1614:select_server_supported_programs] >> 0-docstore1-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version >> (330) >> [2013-09-02 06:37:54.097973] I >> [client-handshake.c:1411:client_setvolume_cbk] 0-docstore1-client-1: >> Connected to 10.200.1.6:24009, attached to remote volume >> '/data/docstore1'. >> [2013-09-02 06:37:54.098005] I >> [client-handshake.c:1423:client_setvolume_cbk] 0-docstore1-client-1: Server >> and Client lk-version numbers are not same, reopening the fds >> [2013-09-02 06:37:54.098094] I [afr-common.c:3685:afr_notify] >> 0-docstore1-replicate-0: Subvolume 'docstore1-client-1' came back up; going >> online. >> [2013-09-02 06:37:54.098274] I >> [client-handshake.c:453:client_set_lk_version_cbk] 0-docstore1-client-1: >> Server lk version = 1 >> [2013-09-02 06:37:54.098619] I >> [client-handshake.c:1614:select_server_supported_programs] >> 0-docstore1-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version >> (330) >> [2013-09-02 06:37:54.099191] I >> [client-handshake.c:1411:client_setvolume_cbk] 0-docstore1-client-0: >> Connected to 10.200.1.5:24009, attached to remote volume >> '/data/docstore1'. >> [2013-09-02 06:37:54.099222] I >> [client-handshake.c:1423:client_setvolume_cbk] 0-docstore1-client-0: Server >> and Client lk-version numbers are not same, reopening the fds >> [2013-09-02 06:37:54.105891] I [fuse-bridge.c:4191:fuse_graph_setup] >> 0-fuse: switched to graph 0 >> [2013-09-02 06:37:54.106039] I >> [client-handshake.c:453:client_set_lk_version_cbk] 0-docstore1-client-0: >> Server lk version = 1 >> [2013-09-02 06:37:54.106179] I [fuse-bridge.c:3376:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel >> 7.17 >> [2013-09-02 06:37:54.108766] I >> [afr-common.c:2022:afr_set_root_inode_on_first_lookup] >> 0-docstore1-replicate-0: added root inode >> >> >> This is driving me nuts - I've been working on getting Geo-Replication >> working for over 2 months now without any success. >> >> Status on the geo-rep shows OK: >> >> root at gfs6:~# gluster volume geo-replication docstore1 >> ssh://root at backup-ds2.gluster:/data/docstore1 status >> MASTER SLAVE >> STATUS >> >> -------------------------------------------------------------------------------- >> docstore1 ssh://root at backup-ds2.gluster:/data/docstore1 >> OK >> root at gfs6:~# >> >> Here's the config: >> >> root at gfs6:~# gluster volume geo-replication docstore1 >> ssh://root at backup-ds2.gluster:/data/docstore1 config >> log_level: DEBUG >> gluster_log_file: >> /var/log/glusterfs/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.gluster.log >> ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no >> -i /var/lib/glusterd/geo-replication/secret.pem >> session_owner: 24f8c92d-723e-4513-9593-40ef4b7e766a >> remote_gsyncd: /usr/lib/glusterfs/glusterfs/gsyncd >> state_file: >> /var/lib/glusterd/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.status >> gluster_command_dir: /usr/sbin/ >> pid_file: >> /var/lib/glusterd/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.pid >> log_file: >> /var/log/glusterfs/geo-replication/docstore1/ssh%3A%2F%2Froot%4010.200.1.12%3Afile%3A%2F%2F%2Fdata%2Fdocstore1.log >> gluster_params: xlator-option=*-dht.assert-no-child-down=true >> root at gfs6:~# >> >> I'm running Ubuntu packages 3.3.2-ubuntu1-precise2 from the ppa. Any >> ideas for why it's stalling? >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > > -- Thanks, *Tony Maro* Chief Information Officer EvriChart ? www.evrichart.com Advanced Records Management Office | 888.801.2020 ? 304.536.1290 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130905/8a368360/attachment.html>