The only other thing I can add is the following log entries from the SSH destination: [2013-07-30 08:51:15.41] I [gsyncd(slave):289:main_i] <top>: syncing: file:///data/docstore1 [2013-07-30 08:51:15.1106] I [resource(slave):200:service_loop] FILE: slave listening [2013-07-30 08:51:20.81000] I [repce(slave):60:service_loop] RepceServer: terminating on reaching EOF. [2013-07-30 08:55:15.154587] I [resource(slave):206:service_loop] FILE: connection inactive for 120 seconds, stopping [2013-07-30 08:55:15.154911] I [gsyncd(slave):301:main_i] <top>: exiting. Which makes sense - it's showing connected and listening and then gets an EOF from the source server at the same time the source server crashes out with the xattr issue. And I must apologize to everyone - I had not realized Google was adding my signature to the bottom every time I hit reply. I'll have to see how to turn that off. On Tue, Jul 30, 2013 at 11:09 AM, Tony Maro <tonym at evrichart.com> wrote: > I'm using the Ubuntu repositories for Precise ( ppa:zfs-native/stable ), > so not sure, but I can guarantee there are no symlinks anywhere within the > volume. The data is all created and maintained by one app that I wrote, > and symlinks aren't ever used. > > > On Tue, Jul 30, 2013 at 10:03 AM, Joe Julian <joe at julianfamily.org> wrote: > >> Are you using the zfs that doesn't allow setting extended attributes on >> symlinks? >> >> Tony Maro <tonym at evrichart.com> wrote: >>> >>> Well I guess I'm carrying on a conversation with myself here, but I've >>> turned on Debug and gsyncd appears to be crashing in _query_xattr - which >>> is odd because as mentioned before I was previously able to get this volume >>> to sync the first 1TB of data before this started, but now it won't even do >>> that. >>> >>> To recap, I'm trying to set up geo-rep over SSH. The Gluster volume is >>> a mirror setup with two bricks. The underlying filesystem is ZFS on both >>> source and destination. The SSH session appears to be started by the >>> client, as the auth log on the destination server does log the following: >>> >>> Jul 30 08:21:37 backup-ds2 sshd[4364]: Accepted publickey for root from >>> 10.200.1.6 port 38865 ssh2 >>> Jul 30 08:21:37 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >>> opened for user root by (uid=0) >>> Jul 30 08:21:51 backup-ds2 sshd[4364]: Received disconnect from >>> 10.200.1.6: 11: disconnected by user >>> Jul 30 08:21:51 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >>> closed for user root >>> >>> I begin the geo-rep with the following command: >>> >>> gluster volume geo-replication docstore1 root at backup-ds2.gluster:/data/docstore1 >>> start >>> >>> Checking the status will show "starting..." for about 7 seconds and then >>> it goes "faulty". >>> >>> The debug gluster.log file on the brick I run the command from shows: >>> >>> [2013-07-30 08:21:37.224934] I [monitor(monitor):21:set_state] Monitor: >>> new state: starting... >>> [2013-07-30 08:21:37.235110] I [monitor(monitor):80:monitor] Monitor: >>> ------------------------------------------------------------ >>> [2013-07-30 08:21:37.235295] I [monitor(monitor):81:monitor] Monitor: >>> starting gsyncd worker >>> [2013-07-30 08:21:37.298254] I [gsyncd:354:main_i] <top>: syncing: >>> gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >>> :/data/docstore1 >>> [2013-07-30 08:21:37.302464] D [repce:175:push] RepceClient: call >>> 21246:139871057643264:1375186897.3 __repce_version__() ... >>> [2013-07-30 08:21:39.376665] D [repce:190:__call__] RepceClient: call >>> 21246:139871057643264:1375186897.3 __repce_version__ -> 1.0 >>> [2013-07-30 08:21:39.376894] D [repce:175:push] RepceClient: call >>> 21246:139871057643264:1375186899.38 version() ... >>> [2013-07-30 08:21:39.378207] D [repce:190:__call__] RepceClient: call >>> 21246:139871057643264:1375186899.38 version -> 1.0 >>> [2013-07-30 08:21:39.393198] D [resource:701:inhibit] DirectMounter: >>> auxiliary glusterfs mount in place >>> [2013-07-30 08:21:43.408195] D [resource:747:inhibit] DirectMounter: >>> auxiliary glusterfs mount prepared >>> [2013-07-30 08:21:43.408740] D [monitor(monitor):96:monitor] Monitor: >>> worker seems to be connected (?? racy check) >>> [2013-07-30 08:21:43.410413] D [repce:175:push] RepceClient: call >>> 21246:139870643156736:1375186903.41 keep_alive(None,) ... >>> [2013-07-30 08:21:43.411798] D [repce:190:__call__] RepceClient: call >>> 21246:139870643156736:1375186903.41 keep_alive -> 1 >>> [2013-07-30 08:21:44.449774] D [master:220:volinfo_state_machine] <top>: >>> (None, None) << (None, 24f8c92d) -> (None, 24f8c92d) >>> [2013-07-30 08:21:44.450082] I [master:284:crawl] GMaster: new master is >>> 24f8c92d-723e-4513-9593-40ef4b7e766a >>> [2013-07-30 08:21:44.450254] I [master:288:crawl] GMaster: primary >>> master with volume id 24f8c92d-723e-4513-9593-40ef4b7e766a ... >>> [2013-07-30 08:21:44.450398] D [master:302:crawl] GMaster: entering . >>> [2013-07-30 08:21:44.451534] E [syncdutils:178:log_raise_exception] >>> <top>: glusterfs session went down [ENOTCONN] >>> [2013-07-30 08:21:44.451721] E [syncdutils:184:log_raise_exception] >>> <top>: FULL EXCEPTION TRACE: >>> Traceback (most recent call last): >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >>> 115, in main >>> main_i() >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >>> 365, in main_i >>> local.service_loop(*[r for r in [remote] if r]) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", >>> line 827, in service_loop >>> GMaster(self, args[0]).crawl_loop() >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >>> 143, in crawl_loop >>> self.crawl() >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >>> 304, in crawl >>> xtl = self.xtime(path) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >>> 74, in xtime >>> xt = rsc.server.xtime(path, self.uuid) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", >>> line 270, in ff >>> return f(*a) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", >>> line 365, in xtime >>> return struct.unpack('!II', Xattr.lgetxattr(path, >>> '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8)) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >>> line 43, in lgetxattr >>> return cls._query_xattr( path, siz, 'lgetxattr', attr) >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >>> line 35, in _query_xattr >>> cls.raise_oserr() >>> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >>> line 25, in raise_oserr >>> raise OSError(errn, os.strerror(errn)) >>> OSError: [Errno 107] Transport endpoint is not connected >>> [2013-07-30 08:21:44.453290] I [syncdutils:142:finalize] <top>: exiting. >>> [2013-07-30 08:21:45.411412] D [monitor(monitor):100:monitor] Monitor: >>> worker died in startup phase >>> [2013-07-30 08:21:45.411653] I [monitor(monitor):21:set_state] Monitor: >>> new state: faulty >>> [2013-07-30 08:21:51.165136] I [syncdutils(monitor):142:finalize] <top>: >>> exiting. >>> >>> >>> >>> On Fri, Jul 26, 2013 at 10:42 AM, Tony Maro <tonym at evrichart.com> wrote: >>> >>>> Correction: Manually running the command after creating the temp >>>> directory actually doesn't work, but it doesn't error out it just hangs and >>>> never connects to the remote server. Dunno if this is something within >>>> gsyncd or what... >>>> >>>> >>>> On Fri, Jul 26, 2013 at 10:38 AM, Tony Maro <tonym at evrichart.com>wrote: >>>> >>>>> Setting up Geo-replication with an existing 3 TB of data is turning >>>>> out to be a huge pain. >>>>> >>>>> It was working for a bit but would go faulty by the time it hit 1TB >>>>> synced. Multiple attempts resulted in the same thing. >>>>> >>>>> Now, I don't know what's changed, but it never actually tries to log >>>>> into the remote server anymore. Checking "last" logs on the destination >>>>> shows that it never actually attempts to make the SSH connection. The >>>>> geo-replication command is as such: >>>>> >>>>> gluster volume geo-replication docstore1 root at backup-ds2.gluster:/data/docstore1 >>>>> start >>>>> >>>>> From the log: >>>>> >>>>> [2013-07-26 10:26:04.317667] I [gsyncd:354:main_i] <top>: syncing: >>>>> gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >>>>> :/data/docstore1 >>>>> [2013-07-26 10:26:08.258853] I [syncdutils(monitor):142:finalize] >>>>> <top>: exiting. >>>>> [2013-07-26 10:26:08.259452] E [syncdutils:173:log_raise_exception] >>>>> <top>: connection to peer is broken >>>>> *[2013-07-26 10:26:08.260386] E [resource:191:errlog] Popen: command >>>>> "ssh -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WlTfNb/gsycnd-ssh-%r@%h:%p >>>>> root at backup-ds2.gluster /usr/lib/glusterfs/glusterfs/gsyncd >>>>> --session-owner 24f8c92d-723e-4513-9593-40ef4b7e766a -N --listen --timeout >>>>> 120 file:///data/docstore1" returned with 143* >>>>> >>>>> When I attempt to run the SSH command from the logs directly in the >>>>> console, ssh replies with: >>>>> >>>>> muxserver_listen bind(): No such file or directory >>>>> >>>>> And, there's no gsyncd temp directory where specified. If I manually >>>>> create that directory and re-run the same command it works. The problem of >>>>> course is that the tmp directory is randomly named and starting Gluster >>>>> geo-rep again will result in a new directory it tries to use. >>>>> >>>>> Running Gluster 3.3.1-ubuntu1~precise9 >>>>> >>>>> Any ideas why this would be happening? I did find that my Ubuntu >>>>> packages were trying to access gsyncd in the wrong path so I corrected >>>>> things. I've also got auto-ssh login using root so I changed my ssh >>>>> command (and my global ssh config) to make sure the options would work. >>>>> Here's the important geo-rep configs: >>>>> >>>>> ssh_command: ssh >>>>> remote_gsyncd: /usr/lib/glusterfs/glusterfs/gsyncd >>>>> gluster_command_dir: /usr/sbin/ >>>>> gluster_params: xlator-option=*-dht.assert-no-child-down=true >>>>> >>>>> Thanks, >>>>> Tony >>>>> >>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> >>>> *Tony Maro* >>>> Chief Information Officer >>>> EvriChart ? www.evrichart.com >>>> Advanced Records Management >>>> Office | 888.801.2020 ? 304.536.1290 >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> >>> *Tony Maro* >>> Chief Information Officer >>> EvriChart ? www.evrichart.com >>> Advanced Records Management >>> Office | 888.801.2020 ? 304.536.1290 >>> >>> ------------------------------ >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>> >>> > > > -- > Thanks, > > *Tony Maro* > Chief Information Officer > EvriChart ? www.evrichart.com > Advanced Records Management > Office | 888.801.2020 ? 304.536.1290 > > -- Thanks, *Tony Maro* Chief Information Officer EvriChart ? www.evrichart.com Advanced Records Management Office | 888.801.2020 ? 304.536.1290 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130730/8c39bf74/attachment-0001.html>