Issue detecting dead peer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am running some tests using two kvm hosts each with a centos 6.5 instance running gluster 3.4.2.  The gluster instances are acting both as a gluster server and client,  mounting the gluster volume they are also serving.  During my test there is no file access occurring on the gluster volume. 

 

I am seeing an issue when I forcibly disconnect node1 from the network.  Node2 can take several minutes before it detects node1 is disconnected.  During this time on node2 running “gluster peer status” shows node1 as connected.  The first run of “gluster volume status” takes two minutes to timeout and then returns with no output.  Subsequent runs of “gluster volume status” returns quickly with “Another transaction is in progress. Please try again after sometime.”  Eventually “gluster peer status” will show node1 as disconnected.  At that point “gluster volume status” starts to return quickly.

 

This behavior is only seen when I do a “service network stop” on node1 to simulate a node failure. If I do a “service glusterd stop” on node1 to cleanly shutdown gluster, node2 sees node1 being disconnected immediately.  The volume status commands return immediately.

 

What is the mechanism for a node to detect a peer has failed?  The delay I am seeing is worrisome to deal with in a production environment.

 

Thanks,

-Joe

 

 

System Administration

ARINC Direct

410-266-4028

 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux