I'm using the Ubuntu repositories for Precise ( ppa:zfs-native/stable ), so not sure, but I can guarantee there are no symlinks anywhere within the volume. The data is all created and maintained by one app that I wrote, and symlinks aren't ever used. On Tue, Jul 30, 2013 at 10:03 AM, Joe Julian <joe at julianfamily.org> wrote: > Are you using the zfs that doesn't allow setting extended attributes on > symlinks? > > Tony Maro <tonym at evrichart.com> wrote: >> >> Well I guess I'm carrying on a conversation with myself here, but I've >> turned on Debug and gsyncd appears to be crashing in _query_xattr - which >> is odd because as mentioned before I was previously able to get this volume >> to sync the first 1TB of data before this started, but now it won't even do >> that. >> >> To recap, I'm trying to set up geo-rep over SSH. The Gluster volume is a >> mirror setup with two bricks. The underlying filesystem is ZFS on both >> source and destination. The SSH session appears to be started by the >> client, as the auth log on the destination server does log the following: >> >> Jul 30 08:21:37 backup-ds2 sshd[4364]: Accepted publickey for root from >> 10.200.1.6 port 38865 ssh2 >> Jul 30 08:21:37 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >> opened for user root by (uid=0) >> Jul 30 08:21:51 backup-ds2 sshd[4364]: Received disconnect from >> 10.200.1.6: 11: disconnected by user >> Jul 30 08:21:51 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >> closed for user root >> >> I begin the geo-rep with the following command: >> >> gluster volume geo-replication docstore1 root at backup-ds2.gluster:/data/docstore1 >> start >> >> Checking the status will show "starting..." for about 7 seconds and then >> it goes "faulty". >> >> The debug gluster.log file on the brick I run the command from shows: >> >> [2013-07-30 08:21:37.224934] I [monitor(monitor):21:set_state] Monitor: >> new state: starting... >> [2013-07-30 08:21:37.235110] I [monitor(monitor):80:monitor] Monitor: >> ------------------------------------------------------------ >> [2013-07-30 08:21:37.235295] I [monitor(monitor):81:monitor] Monitor: >> starting gsyncd worker >> [2013-07-30 08:21:37.298254] I [gsyncd:354:main_i] <top>: syncing: >> gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >> :/data/docstore1 >> [2013-07-30 08:21:37.302464] D [repce:175:push] RepceClient: call >> 21246:139871057643264:1375186897.3 __repce_version__() ... >> [2013-07-30 08:21:39.376665] D [repce:190:__call__] RepceClient: call >> 21246:139871057643264:1375186897.3 __repce_version__ -> 1.0 >> [2013-07-30 08:21:39.376894] D [repce:175:push] RepceClient: call >> 21246:139871057643264:1375186899.38 version() ... >> [2013-07-30 08:21:39.378207] D [repce:190:__call__] RepceClient: call >> 21246:139871057643264:1375186899.38 version -> 1.0 >> [2013-07-30 08:21:39.393198] D [resource:701:inhibit] DirectMounter: >> auxiliary glusterfs mount in place >> [2013-07-30 08:21:43.408195] D [resource:747:inhibit] DirectMounter: >> auxiliary glusterfs mount prepared >> [2013-07-30 08:21:43.408740] D [monitor(monitor):96:monitor] Monitor: >> worker seems to be connected (?? racy check) >> [2013-07-30 08:21:43.410413] D [repce:175:push] RepceClient: call >> 21246:139870643156736:1375186903.41 keep_alive(None,) ... >> [2013-07-30 08:21:43.411798] D [repce:190:__call__] RepceClient: call >> 21246:139870643156736:1375186903.41 keep_alive -> 1 >> [2013-07-30 08:21:44.449774] D [master:220:volinfo_state_machine] <top>: >> (None, None) << (None, 24f8c92d) -> (None, 24f8c92d) >> [2013-07-30 08:21:44.450082] I [master:284:crawl] GMaster: new master is >> 24f8c92d-723e-4513-9593-40ef4b7e766a >> [2013-07-30 08:21:44.450254] I [master:288:crawl] GMaster: primary master >> with volume id 24f8c92d-723e-4513-9593-40ef4b7e766a ... >> [2013-07-30 08:21:44.450398] D [master:302:crawl] GMaster: entering . >> [2013-07-30 08:21:44.451534] E [syncdutils:178:log_raise_exception] >> <top>: glusterfs session went down [ENOTCONN] >> [2013-07-30 08:21:44.451721] E [syncdutils:184:log_raise_exception] >> <top>: FULL EXCEPTION TRACE: >> Traceback (most recent call last): >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >> 115, in main >> main_i() >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >> 365, in main_i >> local.service_loop(*[r for r in [remote] if r]) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >> 827, in service_loop >> GMaster(self, args[0]).crawl_loop() >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >> 143, in crawl_loop >> self.crawl() >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >> 304, in crawl >> xtl = self.xtime(path) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >> 74, in xtime >> xt = rsc.server.xtime(path, self.uuid) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >> 270, in ff >> return f(*a) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >> 365, in xtime >> return struct.unpack('!II', Xattr.lgetxattr(path, >> '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8)) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >> line 43, in lgetxattr >> return cls._query_xattr( path, siz, 'lgetxattr', attr) >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >> line 35, in _query_xattr >> cls.raise_oserr() >> File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >> line 25, in raise_oserr >> raise OSError(errn, os.strerror(errn)) >> OSError: [Errno 107] Transport endpoint is not connected >> [2013-07-30 08:21:44.453290] I [syncdutils:142:finalize] <top>: exiting. >> [2013-07-30 08:21:45.411412] D [monitor(monitor):100:monitor] Monitor: >> worker died in startup phase >> [2013-07-30 08:21:45.411653] I [monitor(monitor):21:set_state] Monitor: >> new state: faulty >> [2013-07-30 08:21:51.165136] I [syncdutils(monitor):142:finalize] <top>: >> exiting. >> >> >> >> On Fri, Jul 26, 2013 at 10:42 AM, Tony Maro <tonym at evrichart.com> wrote: >> >>> Correction: Manually running the command after creating the temp >>> directory actually doesn't work, but it doesn't error out it just hangs and >>> never connects to the remote server. Dunno if this is something within >>> gsyncd or what... >>> >>> >>> On Fri, Jul 26, 2013 at 10:38 AM, Tony Maro <tonym at evrichart.com> wrote: >>> >>>> Setting up Geo-replication with an existing 3 TB of data is turning out >>>> to be a huge pain. >>>> >>>> It was working for a bit but would go faulty by the time it hit 1TB >>>> synced. Multiple attempts resulted in the same thing. >>>> >>>> Now, I don't know what's changed, but it never actually tries to log >>>> into the remote server anymore. Checking "last" logs on the destination >>>> shows that it never actually attempts to make the SSH connection. The >>>> geo-replication command is as such: >>>> >>>> gluster volume geo-replication docstore1 root at backup-ds2.gluster:/data/docstore1 >>>> start >>>> >>>> From the log: >>>> >>>> [2013-07-26 10:26:04.317667] I [gsyncd:354:main_i] <top>: syncing: >>>> gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >>>> :/data/docstore1 >>>> [2013-07-26 10:26:08.258853] I [syncdutils(monitor):142:finalize] >>>> <top>: exiting. >>>> [2013-07-26 10:26:08.259452] E [syncdutils:173:log_raise_exception] >>>> <top>: connection to peer is broken >>>> *[2013-07-26 10:26:08.260386] E [resource:191:errlog] Popen: command >>>> "ssh -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WlTfNb/gsycnd-ssh-%r@%h:%p >>>> root at backup-ds2.gluster /usr/lib/glusterfs/glusterfs/gsyncd >>>> --session-owner 24f8c92d-723e-4513-9593-40ef4b7e766a -N --listen --timeout >>>> 120 file:///data/docstore1" returned with 143* >>>> >>>> When I attempt to run the SSH command from the logs directly in the >>>> console, ssh replies with: >>>> >>>> muxserver_listen bind(): No such file or directory >>>> >>>> And, there's no gsyncd temp directory where specified. If I manually >>>> create that directory and re-run the same command it works. The problem of >>>> course is that the tmp directory is randomly named and starting Gluster >>>> geo-rep again will result in a new directory it tries to use. >>>> >>>> Running Gluster 3.3.1-ubuntu1~precise9 >>>> >>>> Any ideas why this would be happening? I did find that my Ubuntu >>>> packages were trying to access gsyncd in the wrong path so I corrected >>>> things. I've also got auto-ssh login using root so I changed my ssh >>>> command (and my global ssh config) to make sure the options would work. >>>> Here's the important geo-rep configs: >>>> >>>> ssh_command: ssh >>>> remote_gsyncd: /usr/lib/glusterfs/glusterfs/gsyncd >>>> gluster_command_dir: /usr/sbin/ >>>> gluster_params: xlator-option=*-dht.assert-no-child-down=true >>>> >>>> Thanks, >>>> Tony >>>> >>> >>> >>> >>> -- >>> Thanks, >>> >>> *Tony Maro* >>> Chief Information Officer >>> EvriChart ? www.evrichart.com >>> Advanced Records Management >>> Office | 888.801.2020 ? 304.536.1290 >>> >>> >> >> >> -- >> Thanks, >> >> *Tony Maro* >> Chief Information Officer >> EvriChart ? www.evrichart.com >> Advanced Records Management >> Office | 888.801.2020 ? 304.536.1290 >> >> ------------------------------ >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> -- Thanks, *Tony Maro* Chief Information Officer EvriChart ? www.evrichart.com Advanced Records Management Office | 888.801.2020 ? 304.536.1290 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130730/504c6c8a/attachment.html>