One node goes offline, the other node can't see the replicated volume anymore

GregScott at infrasupport.com (Greg Scott) · Thu, 11 Jul 2013 23:09:04 +0000

> Are your two machines perhaps connected via crossover cable?

Well, it's a little more complicated but essentially true.   What follows is a bunch of probably not relevant detail but I owe it to you if I want my problem worked so here goes.  

The NICs are those new smart ones that "know" what kind of cable they see and can reverse the send and receive signals when they need to.  So to be completely accurate, I'm using straight-thru cables to connect everything, not crossovers.   The actual hardware is a Jetway motherboard with 2 onboard NIC slots, with the daughtercard that has 3 NIC slots.  I have the actual model numbers on some paperwork around here someplace.  These are all inside a nifty little Mini-ITX case, so I have a box around 6 inches square and maybe 1 1/2 inches thick with 5 NIC slots.  I am in love with this hardware, at least for firewalls.  

F19 has a new-new way to name Ethernet interface names, so these names will look a little strange.  But here are the details of which interfaces go where - they're the same in both systems.  

Interface enp2s0 goes to my simulated Internet, for now, just an older Linux system with the default Gateway address.  The actual physical path is through an old broken-down Ethernet switch and into my simulated default gateway.  
Interface enp3s0 is the LAN side.  Right now in my testbed, these are empty on both nodes.
Interface enp5s4 does heartbeat and Gluster.  Point to point fw1 <--> fw2.  The IP Address on fw1 is 192.168.253.1 and on fw2, ...253.2.
Interface enp5s6 is empty and will be unused in this application.
Interface enp5s7 is for a future DMZ, but really for development.  It connects to my Ethernet switch right now and is set up to live in my LAN, but will be empty in production.  This comes in handy because I can get to both systems in different windows on my workstation here.  I do all my firewalls this way, with an exta NIC for "kind of out of band" debugging.  

The hardware and cabling all work just fine.  No issues.   I **purposely** isolate fw1 from fw2 on interface enp5s4 to reproduce the problem.  

I first discovered the problem when booting each node.  I have some logic in my bootup that figures out a least assertive and most assertive partner.  The least assertive partner takes its heartbeat/gluster interface offline for a few seconds, so the most assertive partner will miss a couple of pings on the heartbeat interface and take control.   This worked well for several years when both systems were completely separate and I manually kept up my config files on each node.  It also worked well with older versions of Gluster a couple years ago.  But now, trying to use the latest and greatest Gluster, my most assertive partner would never take control.  Digging into it, I found it could not find its rc.firewall script.  Of course, by the time I was done digging through my own application logs, the error condition that setup the problem was long gone and everyone could see everyone again.   So all I had was my failover.log with a message about can't find rc.firewall.  

I've had startup issues before, but this felt different.   So I came up with an experiment.  Node fw1 will always be the least assertive firewall partner, and node fw1 is where I did all the initial Gluster setup.   I think this combination turns out to be relevant in a few sentences.  

So the experiment - on node fw1, put in this firewall rule to reject everything from its partner as the very first rule.  This isolates fw1 and fw2 from each other, but I can still see both of them from my workstation.

iptables -I INPUT 1 -i enp5s4 -s 192.168.253.2 -j REJECT

And then on node fw2, try to do 

ls /firewall-scripts

Sure enough, that failed on fw2.  Node fw2 was unable to access /firewall-scripts.

After reproducing the problem, run this firewall rule on fw1 to get rid of that reject rule so fw1 and fw2 can find each other again:

iptables -D INPUT -i enp5s4 -s 192.168.253.2 -j REJECT

And within a second or so, or as soon as I could whip up an "ls /firewall-scripts" command on fw2, it could now see that directory again.   I posted the logs from all that earlier.  But all the logs really tell us is, fw1 and fw2 are isolated from each other.  Well, duh!  They're isolated because I isolated them!

I've also noticed the behavior seems slightly different when I take fw2 offline and try my ls command from fw1.  Reading through what I can get my hands on, it seems the first brick is apparently kind of a "master", and fw1 is my first brick.  So fw1 is the "important" Gluster partner, but in my application, the least assertive partner.  When both nodes boot at the same time, fw1 will always isolate itself from fw2 for a few seconds, which will mess up fw2, and I'll end up with a firewall system in which nobody asserts itself.  

So the Gluster behavior broke my startup, although I have some ideas to work around that.   More important, this system will be 400 miles from me and it has to be reliable.  What happens when one node goes offline in production, and the other node cannot take control because it can't find the directory with all its scripts?  Right when I need my carefully scripted automated failover the most, it may break because it can't find the scripts it needs.   That kind of stuff is bad for business.  

Anyway, now, after several mounts and umounts and different combinations of mount options, I should probably comment out all my startup stuff again, reboot both boxes, and try some even more structured and methodical tests.  

Or hope for a shortcut to all that testing if anyone has seen this behavior before and has a way around it.  

OK, what behavior would I like to see?  Both nodes should try to satisfy reads locally.  Why reach across the network when there's a copy right here?  If the nodes become isolated from each other, they should still satisfy reads locally, so daemons and other apps running on those nodes can continue to run.  Writes?  Well - satisfy the write locally for now and keep track of what needs to copy over the network.  When the far-end node comes back online, send the changes.  And maybe provide an option for how to handle conflicts when everyone updates the same file.

I know in my use case, I need that /firewall-scripts Gluster directory to stay available on the surviving node when one node goes offline.   I can't have failover scripts not run because they can't find themselves.  

- Greg