On 07/20/2011 05:29 AM, Marc Caubet wrote: > Hi, > > thanks a lot for your reply. > > You need to setup a cluster with fencing, which will let you then use > clustered LVM, clvmd, which in turn uses the distributed lock manager, > dlm. This will allow for the same LVs to be seen and used across cluster > nodes. > > > Ok. I will try this. > > > > Then you will simply add the VMs as resources to rgmanager, which uses > (and sits on top of) corosync, which is itself the core of the cluster. > > > So each virtual machine will be a resource, is it right? If you wish, yes. Having the resource management means that recovery of VMs lost to a host node failure will be automated. It is not, in itself, a requirement. > I'm guessing that you are using RHEL 6, so this may not map perfectly, > but I described how to build a similar Xen-based HA VM cluster on EL5. > The main differences are; Corosync instead of OpenAIS (changes nothing, > configuration wise), ignore DRBD as you have a proper SAN and replace > Xen with KVM. The small GFS2 partition is still recommended for central > storage of the VM definitions (needed for migration and recovery). > However, if you don't have a GFS2 license, you can manually keep the > configs in sync on matching local directories on the nodes. > > See if this helps at all: > > http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_2_Tutorial > > > Actually we are using SL6 but we probably will migrate to RHEL6 if this > environment will be used as production infrastructure in the future. So > we will consider GFS2. > > Thanks a lot for your answer. > > Marc SL6 is based on RHEL6, so the cluster stack will be the same. A couple of notes; Ralph is right, of course, and those RHEL docs are well worth reading. They are certainly more authoritative than my wiki. There are a few things to consider though, if you proceed without a cluster; * Live migration of VMs (as opposed to cold recovery), requires the new and old hosts to simultaneously write to the same LV, iirc. Assuming I am right, then you need to make sure you LV is ACTIVE on both nodes at the same time. I do not know if that is (safely) possible without clustered LVM (and it's use of DLM). * Without a cluster, VM recovery and what not will not be automatic, I don't believe. * Without the cluster's fencing, if a node is (accidentally) flagged as ACTIVE on two nodes, there is nothing preventing corruption of the LV. For example, let's say that a node hangs... After a time, you (or a script) recovers the VM on another node. After, the original node unblocks and goes back to writing to the LV. Suddenly, you've got the same VM running twice on the same block device. Fencing puts the hung node into a known safe state by forcing it to shut down. Only then, after confirmation that the node is actually gone, with another node recover the resources. Building a minimal cluster with fencing is not that hard. It does require some reading and some patience, but it's effectively; * Setup shared SSH keys between nodes. * Edit /etc/cluster/cluster.conf ** define the nodes and how to fence them (device, port) ** define the fence device(s) (ip, user/pass, etc). * Start the cluster * Start clvmd * Create clustered LVs (lvcreate -c y...) Done! -- Digimer E-Mail: digimer@xxxxxxxxxxx Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster