Are you using the zfs that doesn't allow setting extended attributes on symlinks? Tony Maro <tonym at evrichart.com> wrote: >Well I guess I'm carrying on a conversation with myself here, but I've >turned on Debug and gsyncd appears to be crashing in _query_xattr - >which >is odd because as mentioned before I was previously able to get this >volume >to sync the first 1TB of data before this started, but now it won't >even do >that. > >To recap, I'm trying to set up geo-rep over SSH. The Gluster volume is >a >mirror setup with two bricks. The underlying filesystem is ZFS on both >source and destination. The SSH session appears to be started by the >client, as the auth log on the destination server does log the >following: > >Jul 30 08:21:37 backup-ds2 sshd[4364]: Accepted publickey for root from >10.200.1.6 port 38865 ssh2 >Jul 30 08:21:37 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >opened for user root by (uid=0) >Jul 30 08:21:51 backup-ds2 sshd[4364]: Received disconnect from >10.200.1.6: >11: disconnected by user >Jul 30 08:21:51 backup-ds2 sshd[4364]: pam_unix(sshd:session): session >closed for user root > >I begin the geo-rep with the following command: > >gluster volume geo-replication docstore1 >root at backup-ds2.gluster:/data/docstore1 >start > >Checking the status will show "starting..." for about 7 seconds and >then it >goes "faulty". > >The debug gluster.log file on the brick I run the command from shows: > >[2013-07-30 08:21:37.224934] I [monitor(monitor):21:set_state] Monitor: >new >state: starting... >[2013-07-30 08:21:37.235110] I [monitor(monitor):80:monitor] Monitor: >------------------------------------------------------------ >[2013-07-30 08:21:37.235295] I [monitor(monitor):81:monitor] Monitor: >starting gsyncd worker >[2013-07-30 08:21:37.298254] I [gsyncd:354:main_i] <top>: syncing: >gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >:/data/docstore1 >[2013-07-30 08:21:37.302464] D [repce:175:push] RepceClient: call >21246:139871057643264:1375186897.3 __repce_version__() ... >[2013-07-30 08:21:39.376665] D [repce:190:__call__] RepceClient: call >21246:139871057643264:1375186897.3 __repce_version__ -> 1.0 >[2013-07-30 08:21:39.376894] D [repce:175:push] RepceClient: call >21246:139871057643264:1375186899.38 version() ... >[2013-07-30 08:21:39.378207] D [repce:190:__call__] RepceClient: call >21246:139871057643264:1375186899.38 version -> 1.0 >[2013-07-30 08:21:39.393198] D [resource:701:inhibit] DirectMounter: >auxiliary glusterfs mount in place >[2013-07-30 08:21:43.408195] D [resource:747:inhibit] DirectMounter: >auxiliary glusterfs mount prepared >[2013-07-30 08:21:43.408740] D [monitor(monitor):96:monitor] Monitor: >worker seems to be connected (?? racy check) >[2013-07-30 08:21:43.410413] D [repce:175:push] RepceClient: call >21246:139870643156736:1375186903.41 keep_alive(None,) ... >[2013-07-30 08:21:43.411798] D [repce:190:__call__] RepceClient: call >21246:139870643156736:1375186903.41 keep_alive -> 1 >[2013-07-30 08:21:44.449774] D [master:220:volinfo_state_machine] ><top>: >(None, None) << (None, 24f8c92d) -> (None, 24f8c92d) >[2013-07-30 08:21:44.450082] I [master:284:crawl] GMaster: new master >is >24f8c92d-723e-4513-9593-40ef4b7e766a >[2013-07-30 08:21:44.450254] I [master:288:crawl] GMaster: primary >master >with volume id 24f8c92d-723e-4513-9593-40ef4b7e766a ... >[2013-07-30 08:21:44.450398] D [master:302:crawl] GMaster: entering . >[2013-07-30 08:21:44.451534] E [syncdutils:178:log_raise_exception] ><top>: >glusterfs session went down [ENOTCONN] >[2013-07-30 08:21:44.451721] E [syncdutils:184:log_raise_exception] ><top>: >FULL EXCEPTION TRACE: >Traceback (most recent call last): > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >115, in main > main_i() > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line >365, in main_i > local.service_loop(*[r for r in [remote] if r]) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >827, in service_loop > GMaster(self, args[0]).crawl_loop() > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >143, in crawl_loop > self.crawl() > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >304, in crawl > xtl = self.xtime(path) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/master.py", line >74, >in xtime > xt = rsc.server.xtime(path, self.uuid) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >270, in ff > return f(*a) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line >365, in xtime > return struct.unpack('!II', Xattr.lgetxattr(path, >'.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8)) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >line >43, in lgetxattr > return cls._query_xattr( path, siz, 'lgetxattr', attr) >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >line >35, in _query_xattr > cls.raise_oserr() >File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/libcxattr.py", >line >25, in raise_oserr > raise OSError(errn, os.strerror(errn)) >OSError: [Errno 107] Transport endpoint is not connected >[2013-07-30 08:21:44.453290] I [syncdutils:142:finalize] <top>: >exiting. >[2013-07-30 08:21:45.411412] D [monitor(monitor):100:monitor] Monitor: >worker died in startup phase >[2013-07-30 08:21:45.411653] I [monitor(monitor):21:set_state] Monitor: >new >state: faulty >[2013-07-30 08:21:51.165136] I [syncdutils(monitor):142:finalize] ><top>: >exiting. > > > >On Fri, Jul 26, 2013 at 10:42 AM, Tony Maro <tonym at evrichart.com> >wrote: > >> Correction: Manually running the command after creating the temp >directory >> actually doesn't work, but it doesn't error out it just hangs and >never >> connects to the remote server. Dunno if this is something within >gsyncd or >> what... >> >> >> On Fri, Jul 26, 2013 at 10:38 AM, Tony Maro <tonym at evrichart.com> >wrote: >> >>> Setting up Geo-replication with an existing 3 TB of data is turning >out >>> to be a huge pain. >>> >>> It was working for a bit but would go faulty by the time it hit 1TB >>> synced. Multiple attempts resulted in the same thing. >>> >>> Now, I don't know what's changed, but it never actually tries to log >into >>> the remote server anymore. Checking "last" logs on the destination >shows >>> that it never actually attempts to make the SSH connection. The >>> geo-replication command is as such: >>> >>> gluster volume geo-replication docstore1 >root at backup-ds2.gluster:/data/docstore1 >>> start >>> >>> From the log: >>> >>> [2013-07-26 10:26:04.317667] I [gsyncd:354:main_i] <top>: syncing: >>> gluster://localhost:docstore1 -> ssh://root at backup-ds2.gluster >>> :/data/docstore1 >>> [2013-07-26 10:26:08.258853] I [syncdutils(monitor):142:finalize] ><top>: >>> exiting. >>> [2013-07-26 10:26:08.259452] E [syncdutils:173:log_raise_exception] >>> <top>: connection to peer is broken >>> *[2013-07-26 10:26:08.260386] E [resource:191:errlog] Popen: command >>> "ssh -oControlMaster=auto -S >/tmp/gsyncd-aux-ssh-WlTfNb/gsycnd-ssh-%r@%h:%p >>> root at backup-ds2.gluster /usr/lib/glusterfs/glusterfs/gsyncd >>> --session-owner 24f8c92d-723e-4513-9593-40ef4b7e766a -N --listen >--timeout >>> 120 file:///data/docstore1" returned with 143* >>> >>> When I attempt to run the SSH command from the logs directly in the >>> console, ssh replies with: >>> >>> muxserver_listen bind(): No such file or directory >>> >>> And, there's no gsyncd temp directory where specified. If I >manually >>> create that directory and re-run the same command it works. The >problem of >>> course is that the tmp directory is randomly named and starting >Gluster >>> geo-rep again will result in a new directory it tries to use. >>> >>> Running Gluster 3.3.1-ubuntu1~precise9 >>> >>> Any ideas why this would be happening? I did find that my Ubuntu >>> packages were trying to access gsyncd in the wrong path so I >corrected >>> things. I've also got auto-ssh login using root so I changed my ssh >>> command (and my global ssh config) to make sure the options would >work. >>> Here's the important geo-rep configs: >>> >>> ssh_command: ssh >>> remote_gsyncd: /usr/lib/glusterfs/glusterfs/gsyncd >>> gluster_command_dir: /usr/sbin/ >>> gluster_params: xlator-option=*-dht.assert-no-child-down=true >>> >>> Thanks, >>> Tony >>> >> >> >> >> -- >> Thanks, >> >> *Tony Maro* >> Chief Information Officer >> EvriChart ? www.evrichart.com >> Advanced Records Management >> Office | 888.801.2020 ? 304.536.1290 >> >> > > >-- >Thanks, > >*Tony Maro* >Chief Information Officer >EvriChart ? www.evrichart.com >Advanced Records Management >Office | 888.801.2020 ? 304.536.1290 > > >------------------------------------------------------------------------ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://supercolony.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130730/3dcafcca/attachment.html>