Re: GFS2 hangs after one node going down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 21/03/2013 18:48, Maurizio Giungato ha scritto:
Il 21/03/2013 18:14, Digimer ha scritto:
On 03/21/2013 01:11 PM, Maurizio Giungato wrote:
Hi guys,

my goal is to create a reliable virtualization environment using CentOS
6.4 and KVM, I've three nodes and a clustered GFS2.

The enviroment is up and working, but I'm worry for the reliability, if
I turn the network interface down on one node to simulate a crash (for
example on the node "node6.blade"):

1) GFS2 hangs (processes go in D state) until node6.blade get fenced
2) not only node6.blade get fenced, but also node5.blade!

Help me to save my last neurons!

Thanks
Maurizio

DLM, the distributed lock manager provided by the cluster, is designed to block when a known goes into an unknown state. It does not unblock until that node is confirmed to be fenced. This is by design. GFS2, rgmanager and clustered LVM all use DLM, so they will all block as well.

As for why two nodes get fenced, you will need to share more about your configuration.

My configuration is very simple I attached cluster.conf and hosts files.
This is the row I added in /etc/fstab:
/dev/mapper/KVM_IMAGES-VL_KVM_IMAGES /var/lib/libvirt/images gfs2 defaults,noatime,nodiratime 0 0

I set also fallback_to_local_locking = 0 in lvm.conf (but nothing change)

PS: I had two virtualization enviroments working like a charm on OCFS2, but since Centos 6.x I'm not able to install it, there is same way to achieve the same results with GFS2 (with GFS2 sometime I've a crash after only a "service network restart" [I've many interfaces then this operation takes more than 10 seconds], with OCFS2 I've never had this problem.

Thanks
I attached my logs from /var/log/cluster/*



Mar 21 19:00:10 fenced fencing node lama6.blade
Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent fence_bladecenter result: error from agent
Mar 21 19:00:14 fenced fence lama6.blade failed
Mar 21 19:00:17 fenced fencing node lama6.blade
Mar 21 19:00:39 fenced fence lama6.blade success
Mar 21 19:00:45 fenced fencing node lama5.blade
Mar 21 19:00:57 fenced fence lama5.blade success


Mar 21 18:59:00 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 18:59:00 corosync [QUORUM] Members[3]: 1 2 3
Mar 21 18:59:00 corosync [QUORUM] Members[3]: 1 2 3
Mar 21 18:59:00 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:2 left:0)
Mar 21 18:59:00 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Mar 21 18:59:41 corosync [TOTEM ] A processor failed, forming new configuration.
Mar 21 19:00:10 corosync [QUORUM] Members[2]: 1 2
Mar 21 19:00:10 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 19:00:10 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:3 left:1)
Mar 21 19:00:10 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Mar 21 19:00:33 corosync [TOTEM ] A processor failed, forming new configuration.
Mar 21 19:00:45 corosync [QUORUM] Members[1]: 1
Mar 21 19:00:45 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 19:00:45 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:2 left:1)
Mar 21 19:00:45 corosync [MAIN  ] Completed service synchronization, ready to provide service.


Mar 21 19:00:10 rgmanager State change: lama6.blade DOWN
Mar 21 19:00:45 rgmanager State change: lama5.blade DOWN


Mar 21 19:00:10 fenced fencing node lama6.blade
Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent fence_bladecenter result: error from agent
Mar 21 19:00:14 fenced fence lama6.blade failed
Mar 21 19:00:17 fenced fencing node lama6.blade
Mar 21 19:00:39 fenced fence lama6.blade success
Mar 21 19:00:45 fenced fencing node lama5.blade
Mar 21 19:00:57 fenced fence lama5.blade success


Mar 21 19:00:27 qdiskd Writing eviction notice for node 3
Mar 21 19:00:28 qdiskd Writing eviction notice for node 2
Mar 21 19:00:28 qdiskd Node 3 evicted
Mar 21 19:00:29 qdiskd Node 2 evicted

_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos-virt

[Index of Archives]     [CentOS Users]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]

  Powered by Linux