Thanks Csaba, So far as I am aware nothing tampered with the xattrs, and all the bricks etc are time synchronised. Anyway I did as you suggest, now for one volume (I have three being geo-rep'd) I consistently get this: OSError: [Errno 12] Cannot allocate memory [2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ [2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker :2011-06-28 06:04:48.524348] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:app-volume -> file:///geo-tank/app-volume [2011-06-28 06:04:54.480377] I [master:181:crawl] GMaster: new master is eb9f50ba-f17c-4109-ae87-4162925d1db2 [2011-06-28 06:04:54.480622] I [master:187:crawl] GMaster: primary master with volume id eb9f50ba-f17c-4109-ae87-4162925d1db2 ... [2011-06-28 07:38:41.134073] E [syncdutils:131:exception] <top>: FAIL: Traceback (most recent call last): File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in main main_i() File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 296, in main_i local.service_loop(*[r for r in [remote] if r]) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in service_loop GMaster(self, args[0]).crawl_loop() File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 113, in crawl_loop self.crawl() File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl True)[-1], blame=e) == False: File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently return fnc(e) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda> if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct), File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl True)[-1], blame=e) == False: File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently return fnc(e) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda> if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct), File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl True)[-1], blame=e) == False: File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently return fnc(e) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda> if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct), File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 248, in crawl xte = self.xtime(e) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 57, in xtime xt = rsc.server.xtime(path, self.uuid) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 145, in xtime return struct.unpack('!II', Xattr.lgetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8)) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lgetxattr return cls._query_xattr( path, siz, 'lgetxattr', attr) File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _query_xattr cls.raise_oserr() File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory [2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ [2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker Regards, Adrian On 27 Jun 2011, at 23:23, Csaba Henk wrote: > This means that the geo-replication indexing ("xtime" extended attributes) has gone inconsistent. If these xattrs wasn't tampered with by an outside actor (ie. anything that is not the gsyncd process spawned upon the "geo-replication start", and its children), then this happens if the clock of the master box (more precisely, any brick which belongs to the master volume) is set backwards. In that case the whole indexing is gone corrupt and to fix it, you should reset the index with > > # gluster volume set <master volume> geo-replication.indexing off > # gluster volume set <master volume> geo-replication.indexing on > > (for this you should first stop geo-rep sessions with <master volume> as master; they can be restarted after the index reset). The side effect of this operation is that a full rsync-style synchronization will be performed once, ie. files will be checked if match by means of a two-side checksum. > > Regards, > Csaba