C-Sharifi Cluster Engine: The Second Success Story on "Kernel-Level Paradigm" for Distributed Computing Support Contrary to two school of thoughts in providing system software support for distributed computation that advocate either the development of a whole new distributed operating system (like Mach), or the development of library-based or patch-based middleware on top of existing operating systems (like MPI, Kerrighed and Mosix), Dr. Mohsen Sharifi hypothesized another school of thought as his thesis in 1986 that believes all distributed systems software requirements and supports can be and must be built at the Kernel Level of existing operating systems; requirements like Ease of Programming, Simplicity, Efficiency, Accessibility, etc which may be coined as Usability. Although the latter belief was hard to realize, a sample byproduct called DIPC was built purely based on this thesis and openly announced to the Linux community worldwide in 1993. This was admired for being able to provide necessary supports for distributed communication at the Kernel Level of Linux for the first time in the world, and for providing Ease of Programming as a consequence of being realized at the Kernel Level. However, it was criticized at the same time as being inefficient. This did not force the school to trade Ease of Programming for Efficiency but instead tried hard to achieve efficiency, alongside ease of programming and simplicity, without defecting the school that advocates the provision of all needs at the kernel level. The result of this effort is now manifested in the C-Sharifi Cluster Engine. C-Sharifi is a cost effective distributed system software engine in support of high performance computing by clusters of off-the-shelf computers. It is wholly implemented in Kernel, and as a consequence of following this school, it has Ease of Programming, Ease of Clustering, Simplicity, and it can be configured to fit as best as possible to the efficiency requirements of applications that need high performance. It supports both distributed shared memory and message passing styles, it is built in Linux, and its cost/performance ratio in some scientific applications (like meteorology and cryptanalysis) has shown to be far better than non-kernel-based solutions and engines (like MPI, Kerrighed and Mosix). Best Regard ~Ehsan Mousavi C-Sharifi Development Team -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of linux-cluster-request@xxxxxxxxxx Sent: Friday, November 30, 2007 8:30 PM To: linux-cluster@xxxxxxxxxx Subject: Linux-cluster Digest, Vol 43, Issue 46 Send Linux-cluster mailing list submissions to linux-cluster@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request@xxxxxxxxxx You can reach the person managing the list at linux-cluster-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Live migration of VMs instead of relocation (jr) 2. C-Sharifi (Ehsan Mousavi) 3. RE: Adding new file system caused problems (Fair, Brian) 4. RHEL4 Update 4 Cluster Suite Download for Testing (Balaji) 5. Re: Live migration of VMs instead of relocation (Lon Hohberger) 6. Re: on bundling http and https (Lon Hohberger) 7. Re: Live migration of VMs instead of relocation (jr) ---------------------------------------------------------------------- Message: 1 Date: Fri, 30 Nov 2007 11:23:09 +0100 From: jr <johannes.russek@xxxxxxxxxxxxxxxxx> Subject: Live migration of VMs instead of relocation To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <1196418189.16961.9.camel@xxxxxxxxxxxxxxxxxxxxx> Content-Type: text/plain Hello everybody, i was wondering if i could somehow get rgmanager to use live migration of vms when the prefered member of a failover domain for a certain vm service comes up again after a failure. the way it is right now is that if rgmanager detects a failure of a node, the virtual machine gets taken over by a different node with a lower priority. as soon as i the primary node comes back into the cluster, rgmanager relocated the vm to that node, which means shutting it down and starting it on that node again. as i managed to get live migration working in the cluster, i'd like to have rgmanager make use of that. is there a known configuration for this? best regards, johannes russek ------------------------------ Message: 2 Date: Fri, 30 Nov 2007 15:00:20 +0330 From: "Ehsan Mousavi" <mousavi.ehsan@xxxxxxxxx> Subject: C-Sharifi To: Linux-cluster@xxxxxxxxxx Message-ID: <d9b6c3340711300330t2244882dj15a56c07f295281e@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" *C-Sharifi** **Cluster Engine: The Second Success Story on "Kernel-Level Paradigm" for Distributed Computing Support* Contrary to two school of thoughts in providing system software support for distributed computation that advocate either the development of a whole new distributed operating system (like Mach), or the development of library-based or patch-based middleware on top of existing operating systems (like MPI, Kerrighed and Mosix), *Dr. Mohsen Sharifi <msharifi@xxxxxxxxxx>*hypothesized another school of thought as his thesis in 1986 that believes all distributed systems software requirements and supports can be and must be built at the Kernel Level of existing operating systems; requirements like Ease of Programming, Simplicity, Efficiency, Accessibility, etc which may be coined as *Usability*. Although the latter belief was hard to realize, a sample byproduct called DIPC was built purely based on this thesis and openly announced to the Linux community worldwide in 1993. This was admired for being able to provide necessary supports for distributed communication at the Kernel Level of Linux for the first time in the world, and for providing Ease of Programming as a consequence of being realized at the Kernel Level. However, it was criticized at the same time as being inefficient. This did not force the school to trade Ease of Programming for Efficiency but instead tried hard to achieve efficiency, alongside ease of programming and simplicity, without defecting the school that advocates the provision of all needs at the kernel level. The result of this effort is now manifested in the *C-Sharifi** *Cluster Engine. *C-Sharifi* is a cost effective distributed system software engine in support of high performance computing by clusters of off-the-shelf computers. It is wholly implemented in Kernel, and as a consequence of following this school, it has Ease of Programming, Ease of Clustering, Simplicity, and it can be configured to fit as best as possible to the efficiency requirements of applications that need high performance. It supports both distributed shared memory and message passing styles, it is built in Linux, and its cost/performance ratio in some scientific applications (like meteorology and cryptanalysis) has shown to be far better than non-kernel-based solutions and engines (like MPI, Kerrighed and Mosix). Best Regard *Leili Mirtaheri ~Ehsan Mousavi *C-Sharifi* Development Team -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20071130/86c9af20/ attachment.html ------------------------------ Message: 3 Date: Fri, 30 Nov 2007 09:34:45 -0500 From: "Fair, Brian" <xbfair@xxxxxxxxxxxxxxxxxxxx> Subject: RE: Adding new file system caused problems To: "linux clustering" <linux-cluster@xxxxxxxxxx> Message-ID: <97F238EA86B5704DBAD740518CF829100394AE0C@xxxxxxxxxxxxxxxxxxxxxxxxxxx> Content-Type: text/plain; charset="us-ascii" I think this is something we see. The workaround has basically been to disabled clustering (lvm wise) when doing this kind of change, and to handle it manually: Ie: vgchange -c n <vg> to disable the cluster flag lvmconf -disable-cluster on all nodes rescan/discover lun, whatever, on all nodes lvcreate on one node lvchange -refresh on every node lvchange -a y on one node gfs_grow on one host (you can run this on the other to confirm, it should say it can't grow anymore) When done, I've been putting things back how they were with vgchange -c y, lvmconf -disable-cluster, though I think if I you just left it unclustered it'd be fine... what you won't want to do is leave the vg clustered, but not -enable-cluster... if you do this when you reboot the clustered volume groups won't be activated. Hope this helps... if anyone knows of a definitive fix for this I'd like to hear about it, we haven't pushed for it since it isn't too big of a hassle and we aren't constantly adding new volumes, but it is a pain. Brian Fair, UNIX Administrator, CitiStreet 904.791.2662 From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Randy Brown Sent: Tuesday, November 27, 2007 12:23 PM To: linux clustering Subject: Adding new file system caused problems I am running a two node cluster using Centos 5 that is basically being used as a NAS head for our iscsi based storage. Here are the related rpms and their versions I am using: kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5 kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5 system-config-lvm-1.0.22-1.0.el5 cman-2.0.64-1.0.1.el5 rgmanager-2.0.24-1.el5.centos gfs-utils-0.1.11-3.el5 lvm2-2.02.16-3.el5 lvm2-cluster-2.02.16-3.el5 This morning I created a 100GB volume on our storage unit and proceeded to make it available to the cluster so it could be served via NFS to a client on our network. I used pvcreate and vgcreate as I always do and created a new volume group. When I went to create the logical volume I saw this message: Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid not found: 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs I figured I had done something wrong and tried to remove the Lvol and couldn't. Lvdisplay showed that the logvol had been created and vgdisplay looked good with the exception of the volume not being activated. So, I ran vgchange -aly <Volumegroupname> which didn't return any error, but also did not activate the volume. I then rebooted the node which made everything OK. I could now see the VG and lvol, both were active and I could now create the gfs file system on the lvol. The file system mounted and I thought I was in the clear. However, node #2 wasn't picking this new filesystem up at all. I stopped the cluster services on this node which all stopped cleanly and then tried to restart them. cman started fine but clvmd didn't. It hung on the vgscan. Even after a reboot of node #2, clvmd would not start and would hang on the vgscan. It wasn't until I shut down both nodes completely and started cluster that both nodes could see the new filesystem. I'm sure it's my own ignorance that's making this more difficult than it needs to be. Am I missing a step? Is more information required to help? Any assistance in figuring out what happened here would be greatly appreciated. I know I going to need to do similar tasks in the future and obviously can't afford to bring everything down in order for the cluster to see a new filesystem. Thank you, Randy P.S. Here is my cluster.conf: [root@nfs2-cluster ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster alias="ohd_cluster" config_version="114" name="ohd_cluster"> <fence_daemon post_fail_delay="0" post_join_delay="60"/> <clusternodes> <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1" votes="1"> <fence> <method name="1"> <device name="nfspower" port="8" switch="1"/> </method> </fence> </clusternode> <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2" votes="1"> <fence> <method name="1"> <device name="nfspower" port="7" switch="1"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <rm> <failoverdomains> <failoverdomain name="nfs-failover" ordered="0" restricted="1"> <failoverdomainnode name="nfs1-cluster.nws.noaa.gov" priority="1"/> <failoverdomainnode name="nfs2-cluster.nws.noaa.gov" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="140.90.91.244" monitor_link="1"/> <clusterfs device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647" fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/> <nfsexport name="fs-shared-exp"/> <nfsclient name="fs-shared-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/> <clusterfs device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233" fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/> <nfsexport name="rfcdata-exp"/> <nfsclient name="rfcdata-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/> </resources> <service autostart="1" domain="nfs-failover" name="nfs"> <clusterfs ref="fs-shared"> <nfsexport ref="fs-shared-exp"> <nfsclient ref="fs-shared-client"/> </nfsexport> </clusterfs> <ip ref="140.90.91.244"/> <clusterfs ref="rfcdata"> <nfsexport ref="rfcdata-exp"> <nfsclient ref="rfcdata-client"/> </nfsexport> <ip ref="140.90.91.244"/> </clusterfs> </service> </rm> <fencedevices> <fencedevice agent="fence_apc" ipaddr="192.168.42.30" login="rbrown" name="nfspower" passwd="XXXXXXX"/> </fencedevices> </cluster> -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20071130/92c7a845/ attachment.html ------------------------------ Message: 4 Date: Fri, 30 Nov 2007 20:29:18 +0530 From: Balaji <balajisundar@xxxxxxxxxxxxx> Subject: RHEL4 Update 4 Cluster Suite Download for Testing To: linux-cluster@xxxxxxxxxx Message-ID: <47502546.3070205@xxxxxxxxxxxxx> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear All, I am Downloaded the Red Hat Enterprise Linux 4 Update 4 AS 30 Days Evaluation copy and i Installed and testing the Red Hat Enterprise Linux 4 Update 4 AS and i need the Cluster Suite for same The Cluster Suite for same is not available in Red Hat Site Please can any one send me the Cluster Suite link for Red Hat Enterprise Linux 4 Update 4 AS Supported Regards -S.Balaji ------------------------------ Message: 5 Date: Fri, 30 Nov 2007 05:18:26 -0500 From: Lon Hohberger <lhh@xxxxxxxxxx> Subject: Re: Live migration of VMs instead of relocation To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <1196417906.2454.18.camel@xxxxxxxxxxxxxxxxxxxxx> Content-Type: text/plain On Fri, 2007-11-30 at 11:23 +0100, jr wrote: > Hello everybody, > i was wondering if i could somehow get rgmanager to use live migration > of vms when the prefered member of a failover domain for a certain vm > service comes up again after a failure. the way it is right now is that > if rgmanager detects a failure of a node, the virtual machine gets taken > over by a different node with a lower priority. as soon as i the primary > node comes back into the cluster, rgmanager relocated the vm to that > node, which means shutting it down and starting it on that node again. > as i managed to get live migration working in the cluster, i'd like to > have rgmanager make use of that. > is there a known configuration for this? > best regards, 5.1(+updates) does (or should do?) "migrate-or-nothing" when relocating VMs back to the preferred node. That is, if it can't do a migrate, leave the VM where it is. The caveat is of course that the VM is at the top level with no parent node / no children in the resource tree (i.e. it shouldn't be a child of a <service>), like so: <rm> <resources/> <service ...> <child1 .../> </service> <vm /> </rm> Parent/child dependencies aren't allowed because of the stop/start nature of other resources: To stop a node, its children must be stopped, but to start a node, its parents must be started. Note that currently as of 5.1, it's pause-migration, not live-migration - to change this, you need to edit vm.sh and change the "xm migrate ..." command line to "xm migrate -l ...". The upside of pause-migration is that it's a simpler and faster overall operation to transfer the VM from one machine to another. The down side is of course that your downtime is several seconds during migrate rather than the typical <1 sec for live-migration. We plan to switch to live migrate as default instead of pause-migrate (with the ability to select pause migration if desired) in the next update. Actually the change is in CVS if you don't want to hax around with the resource agent: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/rgmanager/sr c/resources/vm.sh?rev=1.1.2.9&content-type=text/plain&cvsroot=cluster&only_w ith_tag=RHEL5 ... hasn't had a lot of testing though. :) -- Lon ------------------------------ Message: 6 Date: Fri, 30 Nov 2007 05:19:31 -0500 From: Lon Hohberger <lhh@xxxxxxxxxx> Subject: Re: on bundling http and https To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <1196417971.2454.20.camel@xxxxxxxxxxxxxxxxxxxxx> Content-Type: text/plain On Thu, 2007-11-29 at 15:26 -0500, Yanik Doucet wrote: > Hello > > I'm trying piranha to see if we could throw out our actual closed > source solution. > > My test setup consist of a client, 2 lvs directors and 2 webservers. > > I first made a virtual http server and it's working great. Nothing > too fancy but I can pull the switch on a director or a webserver with > little impact on availability. > > Now I'm trying to bundle http and https to make sure the client > connect to the same server for both protocol. This is where it fails. > I have the exact same problem as this guy: > > http://osdir.com/ml/linux.redhat.piranha/2006-03/msg00014.html > > > > I setup the firewall marks with piranha, then did the same thing with > iptables, but when I restart pulse, ipvsadm fails to start virtual > service HTTPS as explaned in the above link. If that email is right, it looks like a bug in piranha. -- Lon ------------------------------ Message: 7 Date: Fri, 30 Nov 2007 16:23:26 +0100 From: jr <johannes.russek@xxxxxxxxxxxxxxxxx> Subject: Re: Live migration of VMs instead of relocation To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <1196436206.2437.4.camel@xxxxxxxxxxxxxxxxxxxxx> Content-Type: text/plain Hi Lon, thank you for your detailed answer. That's very good news I'm going to update to 5.1 as soon as this is possible here. I already did the "Hax" e.g. added -l in the ressource agent :) Thanks! regards, johannes > We plan to switch to live migrate as default instead of pause-migrate > (with the ability to select pause migration if desired) in the next > update. Actually the change is in CVS if you don't want to hax around > with the resource agent: > > http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/rgmanager/sr c/resources/vm.sh?rev=1.1.2.9&content-type=text/plain&cvsroot=cluster&only_w ith_tag=RHEL5 > > ... hasn't had a lot of testing though. :) > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster ------------------------------ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 43, Issue 46 ********************************************* -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster