One node goes offline, the other node can't see the replicated volume anymore

GregScott at infrasupport.com (Greg Scott) · Sun, 14 Jul 2013 02:23:35 +0000

I have a thought brewing in my head - how does Gluster "know" the other node is down?  Is it really ICMP pings?  Or is there some kind of heartbeat dialog on TCP port 24007?  Here is where I'm going with this.  My application uses old-fashioned ICMP pings.  When I purposely isolate fw1 and fw2, I used to just do ifdown $HBEAT_IFACE on my least assertive partner for a few seconds at startup time.  I modified it to use the iptables rule I documented before because of the Gluster troubles and I figured downing the whole interface may have been a bit radical since Gluster depends on it.  So I got a little finer grained and put in that iptables rule instead.  But I could get even finer grained and just as easily only block ICMP - or even finer, just block ICMP echo request - and that should satisfy my application and leave Gluster alone.  

Then the testing would switch to test when the other node down really is an exception condition.  Does this make sense?

- Greg