Geo-rep failing

csaba at gluster.com (Csaba Henk) · Thu, 30 Jun 2011 16:27:09 +0200

It seems that the connection gets dropped (or not even able to
establish). Is the ssh auth set up properly from the second volume?

Csaba

On Thu, Jun 30, 2011 at 4:22 PM, Adrian Carpenter <tac12 at wbic.cam.ac.uk> wrote:
> Hi Csaba,
>
> I'm now seeing consistent errors with a second volume:
>
> [2011-06-30 06:08:48.299174] I [monitor(monitor):19:set_state] Monitor: new state: OK
> [2011-06-30 09:27:46.875745] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
> ? ?tf(*aa)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
> ? ?return pickle.load(inf)
> EOFError
> [2011-06-30 09:27:58.413588] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 09:27:58.413830] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 09:27:58.479687] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:28:03.963303] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:28:03.963587] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 09:34:35.592005] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
> ? ?tf(*aa)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
> ? ?return pickle.load(inf)
> EOFError
> [2011-06-30 09:34:45.595258] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 09:34:45.595668] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 09:34:45.661334] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:34:51.145607] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:34:51.145898] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 12:35:54.394453] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
> ? ?tf(*aa)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
> ? ?return pickle.load(inf)
> UnpicklingError: invalid load key, '???'.
> [2011-06-30 12:36:05.839510] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 12:36:05.839916] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 12:36:05.905232] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 12:36:11.413764] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 12:36:11.414047] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
>
>
> Adrian
> On 28 Jun 2011, at 11:16, Csaba Henk wrote:
>
>> Hi Adrian,
>>
>>
>> On Tue, Jun 28, 2011 at 12:04 PM, Adrian Carpenter <tac12 at wbic.cam.ac.uk> wrote:
>>> Thanks Csaba,
>>>
>>> So far as I am aware nothing tampered with the xattrs, ?and all the bricks etc are time synchronised. ?Anyway I did as you suggest, ?now for one volume ?(I have three being geo-rep'd) I consistently get this:
>>>
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> do you get this consistently, or randomly-but-recurring, or spotted
>> once/a few times then gone?
>>
>>> File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _query_xattr
>> ?cls.raise_oserr()
>>> File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in raise_oserr
>> ?raise OSError(errn, os.strerror(errn))
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> If seen more than once, how much does the stack trace vary? Exactly
>> the same, or not exactly but crashes in the same function (just on a
>> different code path), or not exactly but at least in libcxattr module,
>> or quite different?
>>
>> What python version do you use? If you use python 2.4.*, with external
>> ctypes, then what source you've taken ctypes from, what version?
>>
>> Thanks,
>> Csaba
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>