Re: GlusterFS AFR not failing over

gordan@xxxxxxxxxx · Mon, 9 Jun 2008 14:41:09 +0100 (BST)

No - this is a different problem. If the transport timeout was the 
problem, the access should return after < 60 seconds, should it not? In 
the case I'm seeing, something goes wrong and the only way to recover is 
to restart glusterfsd on the server(s) _AND_ glusterfs on the clients.

It's kind of hard to reproduce, as I only see it happening about once 
every week or so.

Gordan

On Sat, 7 Jun 2008, Krishna Srinivas wrote:

Gordon,

Is this the case of transport-timeout being high?

Krishna

On Sat, Jun 7, 2008 at 1:04 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Hi,

I have /home mounted from GlusterFS with AFR, and if one of the servers
(secondary) goes away, I cannot log in. sshd tries to read ~/.ssh and bash
tries to read ~/.bashrc and this seems to fail - or at least take a very
long time to time out and try the remaining server (which verifiably works).

I get this sort of thing in the logs:

E [tcp-client.c:190:tcp_connect] home2: non-blocking connect() returned: 110
(Connection timed out)
E [client-protocol.c:4423:client_lookup_cbk] home2: no proper reply from
server, returning ENOTCONN
C [client-protocol.c:212:call_bail] home2: bailing transport

where home2 is the name of the GlusterFS export on the secondary.

Is this a known issue or have I managed to trip another error case?

Gordan

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel