Re: GlusterFS AFR not failing over

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes! that was the flaw of 1.3.x series's timeout and transport layer design.
There were lot of problems which we couldn't solve by just some simple
things. Hence We came up with 1.4.x series with non-blocking i/o as an
important fix, where the timeouts will be more real, and we can get more
control over it.

-amar

On Thu, Jun 12, 2008 at 7:49 PM, Raghavendra G <raghavendra.hg@xxxxxxxxx>
wrote:

> Hi Gordan,
> Actually you should wait for maximum time of  (transport_timeout * 2 ) to
> actually bail-out and do the cleanup of pending frames. The logic is that
> the timer thread initiates the logic to check whether the call has to be
> bailed out for every transport_timeout seconds. And this logic does the
> cleanup only if there is no frame is sent or received in the last
> transport_timeout seconds.
>
> regards,
> On Mon, Jun 9, 2008 at 5:41 PM, <gordan@xxxxxxxxxx> wrote:
>
> > No - this is a different problem. If the transport timeout was the
> problem,
> > the access should return after < 60 seconds, should it not? In the case
> I'm
> > seeing, something goes wrong and the only way to recover is to restart
> > glusterfsd on the server(s) _AND_ glusterfs on the clients.
> >
> > It's kind of hard to reproduce, as I only see it happening about once
> every
> > week or so.
> >
> > Gordan
> >
> >
> > On Sat, 7 Jun 2008, Krishna Srinivas wrote:
> >
> >  Gordon,
> >>
> >> Is this the case of transport-timeout being high?
> >>
> >> Krishna
> >>
> >> On Sat, Jun 7, 2008 at 1:04 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have /home mounted from GlusterFS with AFR, and if one of the servers
> >>> (secondary) goes away, I cannot log in. sshd tries to read ~/.ssh and
> >>> bash
> >>> tries to read ~/.bashrc and this seems to fail - or at least take a
> very
> >>> long time to time out and try the remaining server (which verifiably
> >>> works).
> >>>
> >>> I get this sort of thing in the logs:
> >>>
> >>> E [tcp-client.c:190:tcp_connect] home2: non-blocking connect()
> returned:
> >>> 110
> >>> (Connection timed out)
> >>> E [client-protocol.c:4423:client_lookup_cbk] home2: no proper reply
> from
> >>> server, returning ENOTCONN
> >>> C [client-protocol.c:212:call_bail] home2: bailing transport
> >>>
> >>> where home2 is the name of the GlusterFS export on the secondary.
> >>>
> >>> Is this a known issue or have I managed to trip another error case?
> >>>
> >>> Gordan
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel@xxxxxxxxxx
> >>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>>
> >>>
> >>
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxx
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
>
> --
> Raghavendra G
>
> A centipede was happy quite, until a toad in fun,
> Said, "Prey, which leg comes after which?",
> This raised his doubts to such a pitch,
> He fell flat into the ditch,
> Not knowing how to run.
> -Anonymous
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Super Storage!


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux