These logs show different results. The results you reported and pasted earlier included, "[2013-07-09 00:59:04.706390] I [afr-common.c:3856:afr_local_init] 0-firewall-scripts-replicate-0: no subvolumes up", which would produce the "Transport endpoint not connected" error you reported at first. These results look normal and should have produced the behavior I described. 42 is The Answer to Life, The Universe, and Everything. Re-establishing FDs and locks is an expensive operation. The ping-timeout is long because it should not happen, but if there is temporary network congestion you'd (normally) rather have your volume remain up and pause than have to re-establish everything. Typically, unless you expect your servers to crash often, leaving ping-timeout at the default is best. YMMV and it's configurable in case you know what you're doing and why. On 07/13/2013 04:58 PM, Greg Scott wrote: > > Log files sent privately to Joe. If others from the community want to > look at them, I?m OK with posting them here. I don?t think they have > anything confidential. Now that I know about that 42 second timeout, > the behavior makes more sense. Why 42? What?s special about 42? > Is there a way I adjust that down for my application to, say, 1 or 2 > seconds? > > -Greg > > *From:*Joe Julian [mailto:joe at julianfamily.org] > *Sent:* Saturday, July 13, 2013 4:28 PM > *To:* Greg Scott; 'gluster-users at gluster.org' > *Subject:* Re: One node goes offline, the other node > can't see the replicated volume anymore > > Huh.. this was in my sent folder... let's try again. > > There's something missing from this picture. The logs show that the > client is connecting to both servers, but it only shows the > disconnection from one and claims that it's not connected to any > bricks after that. > > Here's the data I'd like to have you generate: > > unmount the clients > gluster volume set firewall-scripts diagnostics.client-log-level DEBUG > gluster volume set firewall-scripts diagnostics.brick-log-level DEBUG > systemctl stop glusterd.service > truncate the client, glusterd, and server logs > systemctl start glusterd > mount /firewall-scripts > Do your iptables disconnect > telnet $this_host_ip 24007 # report whether or not it establishes a > connection > ls /firewall-scripts > wait 42 seconds > ls /firewall-scripts > Remove the iptables rule > ls /firewall-scripts > tar up the logs and email them to me. > > You can reset the log-level: > > gluster volume reset firewall-scripts diagnostics.client-log-level > gluster volume reset firewall-scripts diagnostics.brick-log-level > > lastly, do you have a loopback interface (lo) on 127.0.0.1 and is > localhost defined in /etc/hosts? > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130713/eb1a5403/attachment-0001.html>