CSharifi Next Generation of HPC

Ehsan Mousavi <mousavi.ehsan@xxxxxxxxx> · Sat, 1 Dec 2007 09:04:11 +0330

C-Sharifi Cluster Engine: The Second Success Story on "Kernel-Level
Paradigm" for Distributed Computing Support

Contrary to two school of thoughts in providing system software support for
distributed computation that advocate either the development of a whole new
distributed operating system (like Mach), or the development of
library-based or patch-based middleware on top of existing operating systems
(like MPI, Kerrighed and Mosix), Dr. Mohsen Sharifi hypothesized another
school of thought as his thesis in 1986 that believes all distributed
systems software requirements and supports can be and must be built at the
Kernel Level of existing operating systems; requirements like Ease of
Programming, Simplicity, Efficiency, Accessibility, etc which may be coined
as Usability.  Although the latter belief was hard to realize, a sample
byproduct called DIPC was built purely based on this thesis and openly
announced to the Linux community worldwide in 1993.  This was admired for
being able to provide necessary supports for distributed communication at
the Kernel Level of Linux for the first time in the world, and for providing
Ease of Programming as a consequence of being realized at the Kernel Level.
However, it was criticized at the same time as being inefficient. This did
not force the school to trade Ease of Programming for Efficiency but instead
tried hard to achieve efficiency, alongside ease of programming and
simplicity, without defecting the school that advocates the provision of all
needs at the kernel level. The result of this effort is now manifested in
the C-Sharifi Cluster Engine.
 C-Sharifi is a cost effective distributed system software engine in support
of high performance computing by clusters of off-the-shelf computers. It is
wholly implemented in Kernel, and as a consequence of following this school,
it has Ease of Programming, Ease of Clustering, Simplicity, and it can be
configured to fit as best as possible to the efficiency requirements of
applications that need high performance.  It supports both distributed
shared memory and message passing styles, it is built in Linux, and its
cost/performance ratio in some scientific applications (like meteorology and
cryptanalysis) has shown to be far better than non-kernel-based solutions
and engines (like MPI, Kerrighed and Mosix). 

Best Regard
~Ehsan Mousavi
C-Sharifi  Development Team

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of
linux-cluster-request@xxxxxxxxxx
Sent: Friday, November 30, 2007 8:30 PM
To: linux-cluster@xxxxxxxxxx
Subject: Linux-cluster Digest, Vol 43, Issue 46

Send Linux-cluster mailing list submissions to
	linux-cluster@xxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request@xxxxxxxxxx

You can reach the person managing the list at
	linux-cluster-owner@xxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."

Today's Topics:

   1. Live migration of VMs instead of relocation (jr)
   2. C-Sharifi (Ehsan Mousavi)
   3. RE: Adding new file system caused problems (Fair, Brian)
   4. RHEL4 Update 4 Cluster Suite Download for Testing (Balaji)
   5. Re: Live migration of VMs instead of relocation (Lon Hohberger)
   6. Re: on bundling http and https (Lon Hohberger)
   7. Re: Live migration of VMs instead of relocation (jr)

----------------------------------------------------------------------

Message: 1
Date: Fri, 30 Nov 2007 11:23:09 +0100
From: jr <johannes.russek@xxxxxxxxxxxxxxxxx>
Subject:  Live migration of VMs instead of relocation
To: linux clustering <linux-cluster@xxxxxxxxxx>
Message-ID: <1196418189.16961.9.camel@xxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain

Hello everybody,
i was wondering if i could somehow get rgmanager to use live migration
of vms when the prefered member of a failover domain for a certain vm
service comes up again after a failure. the way it is right now is that
if rgmanager detects a failure of a node, the virtual machine gets taken
over by a different node with a lower priority. as soon as i the primary
node comes back into the cluster, rgmanager relocated the vm to that
node, which means shutting it down and starting it on that node again.
as i managed to get live migration working in the cluster, i'd like to
have rgmanager make use of that.
is there a known configuration for this?
best regards,
johannes russek

------------------------------

Message: 2
Date: Fri, 30 Nov 2007 15:00:20 +0330
From: "Ehsan Mousavi" <mousavi.ehsan@xxxxxxxxx>
Subject:  C-Sharifi
To: Linux-cluster@xxxxxxxxxx
Message-ID:
	<d9b6c3340711300330t2244882dj15a56c07f295281e@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

*C-Sharifi** **Cluster Engine: The Second Success Story on "Kernel-Level
Paradigm" for Distributed Computing Support*

 Contrary to two school of thoughts in providing system software support for
distributed computation that advocate either the development of a whole new
distributed operating system (like Mach), or the development of
library-based or patch-based middleware on top of existing operating systems
(like MPI, Kerrighed and Mosix), *Dr. Mohsen Sharifi
<msharifi@xxxxxxxxxx>*hypothesized another school of thought as his
thesis in 1986 that believes
all distributed systems software requirements and supports can be and must
be built at the Kernel Level of existing operating systems; requirements
like Ease of Programming, Simplicity, Efficiency, Accessibility, etc which
may be coined as *Usability*. Although the latter belief was hard to
realize, a sample byproduct called DIPC was built purely based on this
thesis and openly announced to the Linux community worldwide in 1993. This
was admired for being able to provide necessary supports for distributed
communication at the Kernel Level of Linux for the first time in the world,
and for providing Ease of Programming as a consequence of being realized at
the Kernel Level. However, it was criticized at the same time as being
inefficient. This did not force the school to trade Ease of Programming for
Efficiency but instead tried hard to achieve efficiency, alongside ease of
programming and simplicity, without defecting the school that advocates the
provision of all needs at the kernel level. The result of this effort is now
manifested in the *C-Sharifi** *Cluster Engine.

*C-Sharifi* is a cost effective distributed system software engine in
support of high performance computing by clusters of off-the-shelf
computers. It is wholly implemented in Kernel, and as a consequence of
following this school, it has Ease of Programming, Ease of Clustering,
Simplicity, and it can be configured to fit as best as possible to the
efficiency requirements of applications that need high performance. It
supports both distributed shared memory and message passing styles, it is
built in Linux, and its cost/performance ratio in some scientific
applications (like meteorology and cryptanalysis) has shown to be far better
than non-kernel-based solutions and engines (like MPI, Kerrighed and Mosix).

 Best Regard

*Leili Mirtaheri

~Ehsan Mousavi

*C-Sharifi* Development Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20071130/86c9af20/
attachment.html

------------------------------

Message: 3
Date: Fri, 30 Nov 2007 09:34:45 -0500
From: "Fair, Brian" <xbfair@xxxxxxxxxxxxxxxxxxxx>
Subject: RE:  Adding new file system caused problems
To: "linux clustering" <linux-cluster@xxxxxxxxxx>
Message-ID:

<97F238EA86B5704DBAD740518CF829100394AE0C@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

I think this is something we see. The workaround has basically been to
disabled clustering (lvm wise) when doing this kind of change, and to
handle it manually:

Ie:

vgchange -c n <vg> to disable the cluster flag

lvmconf -disable-cluster on all nodes

rescan/discover lun, whatever, on all nodes

lvcreate on one node

lvchange -refresh on every node

lvchange -a y on one node

gfs_grow on one host (you can run this on the other to confirm, it
should say it can't grow anymore)

When done, I've been putting things back how they were with vgchange -c
y, lvmconf -disable-cluster, though I think if I you just left it
unclustered it'd be fine... what you won't want to do is leave the vg
clustered, but not -enable-cluster... if you do this when you reboot the
clustered volume groups won't be activated.

Hope this helps... if anyone knows of a definitive fix for this I'd like
to hear about it, we haven't pushed for it since it isn't too big of a
hassle and we aren't constantly adding new volumes, but it is a pain.

Brian Fair, UNIX Administrator, CitiStreet

904.791.2662

From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Randy Brown
Sent: Tuesday, November 27, 2007 12:23 PM
To: linux clustering
Subject:  Adding new file system caused problems

I am running a two node cluster using Centos 5 that is basically being
used as a NAS head for our iscsi based storage.  Here are the related
rpms and their versions I am using:
kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
system-config-lvm-1.0.22-1.0.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
gfs-utils-0.1.11-3.el5
lvm2-2.02.16-3.el5
lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded
to make it available to the cluster so it could be served via NFS to a
client on our network.  I used pvcreate and vgcreate as I always do and
created a new volume group.  When I went to create the logical volume I
saw this message:
Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid
not found:
9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and
couldn't.  Lvdisplay showed that the logvol had been created and
vgdisplay looked good with the exception of the volume not being
activated.  So, I ran vgchange -aly <Volumegroupname> which didn't
return any error, but also did not activate the volume.  I then rebooted
the node which made everything OK.  I could now see the VG and lvol,
both were active and I could now create the gfs file system on the lvol.
The file system mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I
stopped the cluster services on this node which all stopped cleanly and
then tried to restart them.  cman started fine but clvmd didn't.  It
hung on the vgscan.   Even after a reboot of node #2, clvmd would not
start and would hang on the vgscan.  It wasn't until I shut down both
nodes completely and started cluster that both nodes could see the new
filesystem.

I'm sure it's my own ignorance that's making this more difficult than it
needs to be.  Am I missing a step?  Is more information required to
help?  Any assistance in figuring out what happened here would be
greatly appreciated.  I know I going to need to do similar tasks in the
future and obviously can't afford to bring everything down in order for
the cluster to see a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:
[root@nfs2-cluster ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="60"/>
        <clusternodes>
                <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="8"
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="7"
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="nfs-failover" ordered="0"
restricted="1">
                                <failoverdomainnode
name="nfs1-cluster.nws.noaa.gov" priority="1"/>
                                <failoverdomainnode
name="nfs2-cluster.nws.noaa.gov" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="140.90.91.244" monitor_link="1"/>
                        <clusterfs
device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647"
fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
                        <nfsexport name="fs-shared-exp"/>
                        <nfsclient name="fs-shared-client"
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                        <clusterfs
device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233"
fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/>
                        <nfsexport name="rfcdata-exp"/>
                        <nfsclient name="rfcdata-client"
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                </resources>
                <service autostart="1" domain="nfs-failover" name="nfs">
                        <clusterfs ref="fs-shared">
                                <nfsexport ref="fs-shared-exp">
                                        <nfsclient
ref="fs-shared-client"/>
                                </nfsexport>
                        </clusterfs>
                        <ip ref="140.90.91.244"/>
                        <clusterfs ref="rfcdata">
                                <nfsexport ref="rfcdata-exp">
                                        <nfsclient
ref="rfcdata-client"/>
                                </nfsexport>
                                <ip ref="140.90.91.244"/>
                        </clusterfs>
                </service>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.42.30"
login="rbrown" name="nfspower" passwd="XXXXXXX"/>
        </fencedevices>
</cluster>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20071130/92c7a845/
attachment.html

------------------------------

Message: 4
Date: Fri, 30 Nov 2007 20:29:18 +0530
From: Balaji <balajisundar@xxxxxxxxxxxxx>
Subject:  RHEL4 Update 4 Cluster Suite Download for
	Testing
To: linux-cluster@xxxxxxxxxx
Message-ID: <47502546.3070205@xxxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear All,

I am Downloaded the Red Hat Enterprise Linux 4 Update 4 AS  30 Days 
Evaluation  copy and i Installed  and testing the Red Hat Enterprise 
Linux 4  Update  4 AS  and i need the Cluster Suite for same
The Cluster Suite for same is not available in Red Hat Site

 Please can any one send me the Cluster Suite link for Red Hat 
Enterprise Linux 4 Update 4 AS Supported

Regards
-S.Balaji

------------------------------

Message: 5
Date: Fri, 30 Nov 2007 05:18:26 -0500
From: Lon Hohberger <lhh@xxxxxxxxxx>
Subject: Re:  Live migration of VMs instead of
	relocation
To: linux clustering <linux-cluster@xxxxxxxxxx>
Message-ID: <1196417906.2454.18.camel@xxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain

On Fri, 2007-11-30 at 11:23 +0100, jr wrote:
> Hello everybody,
> i was wondering if i could somehow get rgmanager to use live migration
> of vms when the prefered member of a failover domain for a certain vm
> service comes up again after a failure. the way it is right now is that
> if rgmanager detects a failure of a node, the virtual machine gets taken
> over by a different node with a lower priority. as soon as i the primary
> node comes back into the cluster, rgmanager relocated the vm to that
> node, which means shutting it down and starting it on that node again.
> as i managed to get live migration working in the cluster, i'd like to
> have rgmanager make use of that.
> is there a known configuration for this?
> best regards,

5.1(+updates) does (or should do?) "migrate-or-nothing" when relocating
VMs back to the preferred node.  That is, if it can't do a migrate,
leave the VM where it is.

The caveat is of course that the VM is at the top level with no parent
node / no children in the resource tree (i.e. it shouldn't be a child of
a <service>), like so:

  <rm>
    <resources/>
    <service ...>
      <child1 .../>
    </service>
    <vm />
  </rm>

Parent/child dependencies aren't allowed because of the stop/start
nature of other resources: To stop a node, its children must be stopped,
but to start a node, its parents must be started.

Note that currently as of 5.1, it's pause-migration, not live-migration
- to change this, you need to edit vm.sh and change the "xm migrate ..."
command line to "xm migrate -l ...".

The upside of pause-migration is that it's a simpler and faster overall
operation to transfer the VM from one machine to another.  The down side
is of course that your downtime is several seconds during migrate rather
than the typical <1 sec for live-migration.

We plan to switch to live migrate as default instead of pause-migrate
(with the ability to select pause migration if desired) in the next
update.  Actually the change is in CVS if you don't want to hax around
with the resource agent:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/rgmanager/sr
c/resources/vm.sh?rev=1.1.2.9&content-type=text/plain&cvsroot=cluster&only_w
ith_tag=RHEL5

... hasn't had a lot of testing though. :)

-- Lon

------------------------------

Message: 6
Date: Fri, 30 Nov 2007 05:19:31 -0500
From: Lon Hohberger <lhh@xxxxxxxxxx>
Subject: Re:  on bundling http and https
To: linux clustering <linux-cluster@xxxxxxxxxx>
Message-ID: <1196417971.2454.20.camel@xxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain

On Thu, 2007-11-29 at 15:26 -0500, Yanik Doucet wrote:
> Hello 
> 
> I'm trying piranha to see if we could throw out our actual closed
> source solution.  
> 
> My test setup consist of a client, 2 lvs directors and 2 webservers.
> 
> I first made a virtual http server and it's working great.  Nothing
> too fancy but I can pull the switch on a director or a webserver with
> little impact on availability. 
> 
> Now I'm trying to bundle http and https to make sure the client
> connect to the same server for both protocol.  This is where it fails.
> I have the exact same problem as this guy:
> 
> http://osdir.com/ml/linux.redhat.piranha/2006-03/msg00014.html
> 
> 
> 
> I setup the firewall marks with piranha, then did the same thing with
> iptables, but when I restart pulse, ipvsadm fails to start virtual
> service HTTPS as explaned in the above link.

If that email is right, it looks like a bug in piranha.

-- Lon

------------------------------

Message: 7
Date: Fri, 30 Nov 2007 16:23:26 +0100
From: jr <johannes.russek@xxxxxxxxxxxxxxxxx>
Subject: Re:  Live migration of VMs instead of
	relocation
To: linux clustering <linux-cluster@xxxxxxxxxx>
Message-ID: <1196436206.2437.4.camel@xxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain

Hi Lon,
thank you for your detailed answer.
That's very good news I'm going to update to 5.1 as soon as this is
possible here. I already did the "Hax" e.g. added -l in the ressource
agent :) 
Thanks!
regards,
johannes

> We plan to switch to live migrate as default instead of pause-migrate
> (with the ability to select pause migration if desired) in the next
> update.  Actually the change is in CVS if you don't want to hax around
> with the resource agent:
> 
>
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/rgmanager/sr
c/resources/vm.sh?rev=1.1.2.9&content-type=text/plain&cvsroot=cluster&only_w
ith_tag=RHEL5
> 
> ... hasn't had a lot of testing though. :)
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 43, Issue 46
*********************************************

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster