On 03/22/2013 11:21 AM, Maurizio Giungato wrote: > Il 22/03/2013 00:34, Digimer ha scritto: >> On 03/21/2013 02:09 PM, Maurizio Giungato wrote: >>> Il 21/03/2013 18:48, Maurizio Giungato ha scritto: >>>> Il 21/03/2013 18:14, Digimer ha scritto: >>>>> On 03/21/2013 01:11 PM, Maurizio Giungato wrote: >>>>>> Hi guys, >>>>>> >>>>>> my goal is to create a reliable virtualization environment using >>>>>> CentOS >>>>>> 6.4 and KVM, I've three nodes and a clustered GFS2. >>>>>> >>>>>> The enviroment is up and working, but I'm worry for the >>>>>> reliability, if >>>>>> I turn the network interface down on one node to simulate a crash >>>>>> (for >>>>>> example on the node "node6.blade"): >>>>>> >>>>>> 1) GFS2 hangs (processes go in D state) until node6.blade get fenced >>>>>> 2) not only node6.blade get fenced, but also node5.blade! >>>>>> >>>>>> Help me to save my last neurons! >>>>>> >>>>>> Thanks >>>>>> Maurizio >>>>> >>>>> DLM, the distributed lock manager provided by the cluster, is >>>>> designed to block when a known goes into an unknown state. It does >>>>> not unblock until that node is confirmed to be fenced. This is by >>>>> design. GFS2, rgmanager and clustered LVM all use DLM, so they will >>>>> all block as well. >>>>> >>>>> As for why two nodes get fenced, you will need to share more about >>>>> your configuration. >>>>> >>>> My configuration is very simple I attached cluster.conf and hosts >>>> files. >>>> This is the row I added in /etc/fstab: >>>> /dev/mapper/KVM_IMAGES-VL_KVM_IMAGES /var/lib/libvirt/images gfs2 >>>> defaults,noatime,nodiratime 0 0 >>>> >>>> I set also fallback_to_local_locking = 0 in lvm.conf (but nothing >>>> change) >>>> >>>> PS: I had two virtualization enviroments working like a charm on >>>> OCFS2, but since Centos 6.x I'm not able to install it, there is same >>>> way to achieve the same results with GFS2 (with GFS2 sometime I've a >>>> crash after only a "service network restart" [I've many interfaces >>>> then this operation takes more than 10 seconds], with OCFS2 I've never >>>> had this problem. >>>> >>>> Thanks >>> I attached my logs from /var/log/cluster/* >> >> The configuration itself seems ok, though I think you can safely take >> qdisk out to simplify things. That's neither here nor there though. >> >> This concerns me: >> >> Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent >> fence_bladecenter result: error from agent >> Mar 21 19:00:14 fenced fence lama6.blade failed >> >> How are you triggering the failure(s)? The failed fence would >> certainly help explain the delays. As I mentioned earlier, DLM is >> designed to block when a node is in an unknowned state (failed but not >> yet successfully fenced). >> >> As an aside; I do my HA VMs using clustered LVM LVs as the backing >> storage behind the VMs. GFS2 is an excellent file system, but it is >> expensive. Putting your VMs directly on the LV takes them out of the >> equation > > I used 'service network stop' to simulate the failure, the node get > fenced through fence_bladecenter (BladeCenter HW) > > Anyway, I took qdisk out and put GFS2 aside and now I've my VM on LVM > LVs, I'm trying for many hours to reproduce the issue > > - only the node where I execute 'service network stop' get fenced > - using fallback_to_local_locking = 0 in lvm.conf LVM LVs remain > writable also while fencing take place > > All seems to work like a charm now. > > I'd like to understand what was happening. I'll try for same day before > trusting it. > > Thank you so much. > Maurizio > Testing testing testing. It's good that you plan to test before trusting. I wish everyone had that philosophy! The clustered locking for LVM comes into play for activating/inactivating, creating, deleting, resizing and so on. It does not affect what happens in an LV. That's why an LV remains writeable when a fence is pending. However, I feel this is safe because rgmanager won't recover a VM on another node until the lost node is fenced. Cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ CentOS-virt mailing list CentOS-virt@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos-virt