One node goes offline, the other node can't see the replicated volume anymore

GregScott at infrasupport.com (Greg Scott) · Thu, 18 Jul 2013 21:20:28 +0000

> You can set the timeout with:
>
> $ gluster volume set <volname> network.ping-timeout <N>
>
> I don't usually set it to anything under 20.
>

I was just getting ready to put this in, but a bunch more questions are filling my head.  The biggie is, how do I look up the current setting?  Gluster volume info doesn't show me that number.  Is there something else that can show all the detailed settings?

I was thinking of setting it down to as little as 5 seconds.  And even 5 seconds might be too long.  In my specific use case, my failover script polls the active partner every 10 seconds.  By default, if he doesn't respond in 2 intervals (20 seconds), I initiate my failover stuff.  When I start a failover, I really really really need that /firewall-scripts directory to be usable.   this specific use case, I'm not sure it makes sense to wait 20 seconds.  But before I mess with it, I want to see where it's set right now so I have a baseline.  

Thanks

- Greg

-----Original Message-----
From: Ben Turner [mailto:bturner at redhat.com] 
Sent: Thursday, July 18, 2013 9:33 AM
To: Greg Scott
Cc: Joe Julian; gluster-users at gluster.org
Subject: Re: One node goes offline, the other node can't see the replicated volume anymore

You can set the timeout with:

$ gluster volume set <volname> network.ping-timeout <N>

I don't usually set it to anything under 20. 

-b

----- Original Message -----
> From: "Greg Scott" <GregScott at infrasupport.com>
> To: "Joe Julian" <joe at julianfamily.org>
> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
> Sent: Thursday, July 18, 2013 1:00:50 AM
> Subject: Re: One node goes offline, the other node 
> can't see the replicated volume anymore
> 
> 
> 
> Still not out of the woods. I can get everything mounted on both nodes 
> with my systemd service hack. But now I?m back to the original 
> problem. Well, sort of. Here is the scenario.
> 
> 
> 
> My Gluster volume named /firewall-scripts is mounted on both fw1 and fw2.
> Trying to simulate a cable issue, on fw1, I do:
> 
> 
> 
> ifdown enp5s4
> 
> 
> 
> And now all access to my /firewall-scripts volume on fw1 goes away. 
> Fw2 can see it after more than the mystical 42 seconds. When I do
> 
> 
> 
> ifup enp4s4
> 
> 
> 
> I still can?t see my /firewall-scripts volume on fw1 and it is no 
> longer mounted. Not quite one minute later, my volume is mounted again 
> and life goes on.
> 
> 
> 
> If that 42 second timeout is settable, how do I set it for a better 
> number for my application? The Gluster/Heartbeat network in this case 
> will just be a cable connecting the two nodes.
> 
> 
> 
> Thanks
> 
> 
> 
> - Greg
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users