HA translator total failure in 2.0.0rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

As noted in this email to the gluster-users list :
http://zresearch.com/pipermail/gluster-users/20090114/001389.html

I've got a simple and reproducible scenario to crash a Gluster client using the HA translator to access two AFR'd servers. The scenario is identical to that described by Krishna Srinivas on the gluster-devel list on 08-01-2008 :
http://lists.gnu.org/archive/html/gluster-devel/2009-01/msg00059.html

       Client
         |
         HA
        /  \
       /    \
    AFR1    AFR2
     |        |
 Server1    Server2

Basically, if i stop glusterfsd on Server1, HA on Client switches to AFR2 as expected ; however, when i re-enable glusterfsd on Server1, then stop glusterfsd on Server2, one of two things occurs : 1. Client stops communicating entirely with the cluster (transport endpoint not connected), or
2. Client recovers and continues communicating with AFR1.
It appears to be random as to which one actually occurs.

If the client recovers and continues to communicate, and i re-enable glusterfsd on Server2, Client stops communicating immediately with the cluster - every time, guarunteed.

There are therefore two key questions :
1. In the first component, why doesn't the client switch gracefully between available subvolumes ? 2. In the second component, why does re-enabling a previously-unavailable subvolume crash the client ?

All relevant details are in the mail to the gluster-users list, linked above.

Any ideas what's going on here ?


--
Daniel Maher <dma+gluster AT witbe DOT net>




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux