AFR w/ RRDNS failover - does it work or not ? (WAS: simple AFR setup, one server crashes, entire cluster becomes unusable ?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Keith Freedman wrote:

> the issue isn't reliability, it's availability.
> 
> if a client only talks to one server and that server goes down then the 
> client has nothing to 'fail over' to.  however, if the client talks to 
> both servers then if one goes down it'll keep talking to the other one.

Either the clients will honour the RRDNS and pick another server, or 
they won't - unfortunately, we now have a case where two opposing 
possibilities are being presented.  To wit :


 From the ? Gotcha ? page :
http://www.gluster.org/docs/index.php/AFR_(Automatic_File_Replication)_-_Things_to_keep_in_mind_and_gotchas

Applies to server side
[...]
"The clients connect only to 1 server. You would need to implement some 
kind of load balancing or something either with round robin DNS [...]"
"If you have client1 connected to server1 and client2 connected to 
server2, and then server2 goes down, so does client2. The cluster also 
becomes unavailable."


Ok, that seems like a straightforward enough statement, however, if we 
take a look back through the mailing list archives, we find a statment 
from Mr. Anand Avati which suggests exactly the opposite :
http://lists.nongnu.org/archive/html/gluster-devel/2008-04/msg00007.html

[...]
"Or, put another way, if ClientA (by chance) resolves 
roundrobin.gluster.local to 192.168.252.1, but .1 is currently down - 
what happens ?

it will attempt on .2, and if that fails (or disconnects after a while), 
it will attempt on .3, and once all the entries are used 'once', it will 
do a fresh dns query.  it does not honor dns refresh timeouts (yet)."


The remaining basic question then is this : does AFR w/ RRDNS failover 
work or not ?  If it does, then the ? Gotcha ? page should be updated, 
/and/ further investigation is required to determine why it failed to 
operate as advertised in my environment.  If it does /not/, then the ? 
Gotcha ? page should be updated, and the wiki page i wrote (based 
largely on the suggestions of the developers) should likely be scrapped. :P

As always, thank you all for your continued discourse !

-- 
Daniel Maher <dma+gluster AT witbe DOT net>



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux