Geo-rep failing

tac12 at wbic.cam.ac.uk (Adrian Carpenter) · Tue, 28 Jun 2011 11:04:19 +0100

Thanks Csaba,

So far as I am aware nothing tampered with the xattrs,  and all the bricks etc are time synchronised.  Anyway I did as you suggest,  now for one volume  (I have three being geo-rep'd) I consistently get this:

OSError: [Errno 12] Cannot allocate memory
[2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
[2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
:2011-06-28 06:04:48.524348] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:app-volume -> file:///geo-tank/app-volume
[2011-06-28 06:04:54.480377] I [master:181:crawl] GMaster: new master is eb9f50ba-f17c-4109-ae87-4162925d1db2
[2011-06-28 06:04:54.480622] I [master:187:crawl] GMaster: primary master with volume id eb9f50ba-f17c-4109-ae87-4162925d1db2 ...
[2011-06-28 07:38:41.134073] E [syncdutils:131:exception] <top>: FAIL: 
Traceback (most recent call last):
 File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in main
   main_i()
 File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 296, in main_i
   local.service_loop(*[r for r in [remote] if r])
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in service_loop
   GMaster(self, args[0]).crawl_loop()
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 113, in crawl_loop
   self.crawl()
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
   True)[-1], blame=e) == False:
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
   return fnc(e)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
   if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
   True)[-1], blame=e) == False:
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
   return fnc(e)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
   if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
   True)[-1], blame=e) == False:
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
   return fnc(e)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
   if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 248, in crawl
   xte = self.xtime(e)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 57, in xtime
   xt = rsc.server.xtime(path, self.uuid)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 145, in xtime
   return struct.unpack('!II', Xattr.lgetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8))
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lgetxattr
   return cls._query_xattr( path, siz, 'lgetxattr', attr)
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _query_xattr
   cls.raise_oserr()
 File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in raise_oserr
   raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory
[2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
[2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker

Regards,

Adrian

On 27 Jun 2011, at 23:23, Csaba Henk wrote:

> This means that the geo-replication indexing ("xtime" extended attributes) has gone inconsistent. If these xattrs wasn't tampered with by an outside actor  (ie. anything that is not the gsyncd process spawned upon the "geo-replication start", and its children), then this happens if the clock of the master box (more precisely, any brick which belongs to the master volume) is set backwards. In that case the whole indexing is gone corrupt and to fix it, you should reset the index with
> 
> # gluster volume set <master volume> geo-replication.indexing off
> # gluster volume set <master volume> geo-replication.indexing on
> 
> (for this you should first stop geo-rep sessions with <master volume> as master; they can be restarted after the index reset). The side effect of this operation is that a full rsync-style synchronization will be performed once, ie. files will be checked if match by means of a two-side checksum.
> 
> Regards,
> Csaba