Re: cluster issue

"ssloh" <ssloh@xxxxxxxxxxxxxx> · Fri, 10 May 2013 14:35:04 +0800

----- Original Message ----- 
From: "santosh lohar" <sslohar@xxxxxxxxx>
To: <linux-cluster@xxxxxxxxxx>
Sent: Tuesday, September 28, 2010 2:44 PM
Subject:  cluster issue

Hi all,

I am facing the problem with SGE and flexlm licencing details are below:

*Hardware: * IBM 3650 , 2 Quad core CPU , 16 GB RAM , total nos of node2 +
one master node conected with IB switch connectivity:
*Software* : ROCKS 5.1 / os -RHEL4 mars hill/ fluent / MSC mentat.

Problem :
1 when I submitt the jobs with SGE the "qhost -F MDAdv " is showinf updated
status of license issued and avilable
but when I submitt the jobs outside SGE then it will not able
to recognize the latest status of license tokens
2. jobs submitted after 4 cpu's then cluster computation will get slows down
,

Kindly suggest me what to do in this case , thanks in advance

Regards
Santosh

On Mon, Sep 27, 2010 at 11:07 PM, <linux-cluster-request@xxxxxxxxxx> wrote:

> Send Linux-cluster mailing list submissions to
>        linux-cluster@xxxxxxxxxx
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
>        linux-cluster-request@xxxxxxxxxx
>
> You can reach the person managing the list at
>        linux-cluster-owner@xxxxxxxxxx
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>   1. Unable to patch conga (fosiul alam)
>   2. Re: ricci is very unstable in one nodes (Paul M. Dyer)
>   3. Re: porblem with quorum at cluster boot (brem belguebli)
>   4. Re: ricci is very unstable in one nodes (fosiul alam)
>   5. Re: ricci is very unstable in one nodes (fosiul alam)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 27 Sep 2010 17:02:20 +0100
> From: fosiul alam <expertalert@xxxxxxxxx>
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Subject:  Unable to patch conga
> Message-ID:
>        <AANLkTimdQNO3x3g5EKc2ETMPePf3iA-Cptiih6rLb4Au@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="iso-8859-1"
>
> hi
> Due to the same issue, I see exact same problem in my luci interface
> so i am trying to patch conga.
>
> I downloaded ,
>
>
> http://mirrors.kernel.org/centos/5/os/SRPMS/conga-0.12.2-12.el5.centos.1.src.rpm
> rpm -i conga-0.12.2-12.el5.centos.1.src.rpm
> cd /usr/src/redhat/SOURCE
>
> tar -xvzf conga-0.12.2.tar.gz
> patch -p0 < /path/to/where_the_patch/ricci.patch
>
> [root@beaver SOURCES]# cd conga-0.12.2
>
> Now i am facing the problem to install
>
> ./autogen.sh --include_zope_and_plone=yes
> Zope-2.9.8-final.tgz passed sha512sum test
> Plone-2.5.5.tar.gz passed sha512sum test
> cat: clustermon.spec.in.in: No such file or directory
>
> Run `./configure` to configure conga build,
> or `make srpms` to build conga and clustermon srpms
> or `make rpms` to build all rpms
>
> [root@beaver conga-0.12.2]#  ./configure --include_zope_and_plone=yes
> D-BUS version 1.1.2 detected  -> major 1, minor 1
> missing zope directory, extract zope source-code into it and try again
>
>
> Now, how will i tell ./configure where is zope and plone ?
> do i need this zope and plone ?
>
> Please give me some advise
>
> Fosiul
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20100927/21959f19/attachment.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Sep 2010 11:55:28 -0500 (CDT)
> From: "Paul M. Dyer" <pmdyer@xxxxxxxxxxxxxxx>
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Subject: Re:  ricci is very unstable in one nodes
> Message-ID: <1480320.10.1285606528829.JavaMail.root@athena>
> Content-Type: text/plain; charset=utf-8
>
> http://rhn.redhat.com/errata/RHBA-2010-0716.html
>
> It appears that this problem has been fixed in this errata.
>
> I installed the luci and ricci updates and did some lite testing.   So far,
> the timeout 11111 error has not shown up.
>
> Paul
>
> ----- Original Message -----
> From: "fosiul alam" <expertalert@xxxxxxxxx>
> To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> Sent: Monday, September 27, 2010 10:48:27 AM
> Subject: Re:  ricci is very unstable in one nodes
>
> Hi
> i am trying to patch ricci . let see how it goes
>
> but clusvcadm is failing as well
>
> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> Member http1.xxxx.local trying to enable service:httpd1...Invalid
> operation for resource
>
> here, http1 , where i was trying to run the service from luci
>
> what could be the problem ?
> is there any way to find out if there is any problem with config ??
>
> On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:
>
>
> RHEL 5.6 hasn't been released yet so your package probably contains the
> problem. I'm not sure how in sync Centos is with RHEL or if they patch
> earlier so I cannot give you a time frame when it will be in Centos or
> if they have already patched it. The problem in that BZ is more of an
> annoyance, you usually just have to retry a time or two and it works. If
> you can't get Luci working properly with your service at all you should
> try enabling the service through the command line with clusvcadm -e. If
> it is not working from the command line either then there is a problem
> with the service config.
>
>
>
>
> -Ben
>
>
>
>
> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
>
> > Hi Ben
> > Thanks
> >
> > I named this cluster as mysql-server but i have not installed mysql
> > database in their yet
> >
> > and both luci and ricci on luci server and node1 is running this
> > version
> >
> > luci-0.12.2-12.el5.centos.1
> > ricci-0.12.2-12.el5.centos.1
> >
> >
> > do you think this version has problem as well ??
> >
> > thanks for your help
> >
> >
> >
> >
> > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:
> >
> >
> > There is an issue with ricci timeouts that was fixed recently:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=564490
> >
> > I'm not sure but you may be hitting that bug. Symptoms include: luci
> > isn't able to get the status from the node, timeouts when querying
> > ricci, etc. The fix should be released with 5.6
> >
> > On the mysql service there are some options that you need to set. Here
> > are all the options available to that agent:
> >
> > mysql
> > Defines a MySQL database server
> >
> > Attribute Description
> > config_file Define configuration file
> > listen_address Define an IP address for MySQL server. If the address
> > is not given then first IP address from the service is taken.
> > mysqld_options Other command-line options for mysqld
> > name Name
> > ref Reference to existing mysql resource in the resources section.
> > service_name Inherit the service name.
> > shutdown_wait Wait X seconds for correct end of service shutdown
> > startup_wait Wait X seconds for correct end of service startup
> > __enforce_timeouts Consider a timeout for operations as fatal.
> > __failure_expire_time Amount of time before a failure is forgotten.
> > __independent_subtree Treat this and all children as an independent
> > subtree. __max_failures Maximum number of failures before returning a
> > failure to a status check.
> >
> > If I recall correctly you may need to tweak:
> >
> > shutdown_wait Wait X seconds for correct end of service shutdown
> > startup_wait Wait X seconds for correct end of service startup
> >
> > There can be problems relocating the DB if it takes too long to
> > start/shutdown. If you are having problems relocating with luci it may
> > be a good idea to test with:
> >
> > # clusvcadm -r <service name> -m <cluster node>
> >
> > -Ben
> >
> >
> >
> >
> >
> >
> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
> >
> > > Hi
> > > I have 4 nodes cluster,
> > > It was running fine. but today one nodes is giving trouble
> > >
> > > From luci Gui interface, when i try to relocate service into this
> > node
> > > and trying to relocate from this nodes to another nodes
> > >
> > > from luci gui interface, its showing :
> > >
> > > Unable to retrieve batch 1908047789 status from
> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
> > > Starting cluster service "httpd1" on node "http1.domain.local" --
> > You
> > > will be redirected in 5 seconds.
> > > also
> > >
> > > The ricci agent for this node is unresponsive. Node-specific
> > > information is not available at this time. :
> > >
> > > but ricci is running on problematic node ,
> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
> > >
> > > there is not any firewall running.
> > >
> > > iptables -L
> > > Chain INPUT (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain FORWARD (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain OUTPUT (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain RH-Firewall-1-INPUT (0 references)
> > > target prot opt source destination
> > >
> > > port 11111 is runningg
> > >
> > > netstat -an | grep 11111
> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
> > >
> > >
> > > but still ricci is very unstable , and i cant relocate any service
> > on
> > > this node or i cant relocate any service away from this node.
> > >
> > > from problematic node if i type this
> > >
> > > clustat
> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
> > > Member Status: Quorate
> > >
> > > Member Name ID Status
> > > ------ ---- ---- ------
> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
> > > server publicdns1.xxxx.local 2 Online, rgmanager
> > > http1.xxxx.local 3 Online, Local, rgmanager
> > > mail01.xxxxx.local 4 Online, rgmanager
> > >
> > > Service Name Owner (Last) State
> > > ------- ---- ----- ------ -----
> > > service:httpd1 mail01.xxxx.local started
> > > service:mysql-server http1.xxxx.local started -------------------
> > this
> > > is the problematic node
> > > service:public-dns publicdns1.xxxxxx.local started
> > >
> > > I cant move that service mysql-server from this node or cant
> > relocate
> > > any service on this node ..
> > > I am very confused.
> > >
> > > what shall i do to fix this issue ??
> > >
> > > thanks for your advise.
> > >
> > >
> > >
> > >
> > > -- Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > -- Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > -- Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> -- Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 27 Sep 2010 19:05:06 +0200
> From: brem belguebli <brem.belguebli@xxxxxxxxx>
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Subject: Re:  porblem with quorum at cluster boot
> Message-ID:
>        <AANLkTi=FOA-cj5hg11zBmZdzWyQiMpPCM9FZiKgFQHH9@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="iso-8859-1"
>
> The configuration you are trying to build, 2 cluster nodes (1 vote each)
> plus a quorum disk 1 vote (making a total expected votes= 3) must remain up
> if you loose 1 of the members (as long as the remaining node still accesses
> the quorum disk) because there are still 2   active votes (1 remaining node
> + 1 quorum disk) = 2 > expected_votes/2.
>
> The Quorum (majority) must be greater (absolutely greater  >) than
> expected_votes/2 (51% or greater) in order to service to continue.
>
>
> 2010/9/27 Bennie R Thomas <Bennie_R_Thomas@xxxxxxxxxxxx>
>
> > Try setting your expected votes to 2 or 1..
> >
> > Your Cluster is hanging with one node because it want's 3 votes.
> >
> >
> >
> >   From: Brem Belguebli <brem.belguebli@xxxxxxxxx> To: linux clustering <
> > linux-cluster@xxxxxxxxxx> Date: 09/25/2010 10:30 AM Subject: Re:
> >  porblem with quorum at cluster boot Sent by:
> > linux-cluster-bounces@xxxxxxxxxx
> > ------------------------------
> >
> >
> >
> > On Fri, 2010-09-24 at 12:52 -0400, Jason_Henderson@xxxxxxxxx wrote:
> > >
> > > I think you still need two_node="1" in your conf file if you want a
> > > single node to become quorate.
> > >
> > two_nodes=1 is only valid if you do not have a quorum disk.
> >
> > > linux-cluster-bounces@xxxxxxxxxx wrote on 09/24/2010 12:38:17 PM:
> > >
> > > > hello,
> > > >
> > > > I have a 2 node cluster with qdisk quorum partition;
> > > >
> > > > each node has 1 vote and the qdisk has 1 vote too; in cluster.conf
> > > I
> > > > have this explicit declaration:
> > > > <cman expected_votes="3" two_node="0"\>
> > > >
> > > > when I have both 2 nodes active cman_tool status tell me this:
> > > >
> > > > Version: 6.1.0
> > > > Nodes: 2
> > > > Expected votes: 3
> > > > Quorum device votes: 1
> > > > Total votes: 3
> > > > Node votes: 1
> > > > Quorum: 2
> > > >
> > > > then, if I power off a node these value, as expected, changed this
> > > way:
> > > > Nodes: 1
> > > > Total votes: 2
> > > >
> > > > and the cluster is still quorate and functional.
> > > >
> > > > the problem is if I power off both the node and them power on only
> > > one
> > > > of them: in this case the single node does not quorate and the
> > > cluster
> > > > does not start: I have to power on both the node to have the
> > > cluster
> > > > (and services on the cluster) working.
> > > >
> > > > I'd like the cluster can work (and boot) even with a single node
> > > (ie, if
> > > > one of the node has hw failure and is down I still want to be able
> > > to
> > > > reboot the working node and have it booting correctly the cluster)
> > > >
> > > > any hints? (thank's for reading all this)
> > > >
> > > > --
> > > > bye,
> > > > emilio
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster@xxxxxxxxxx
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20100927/e452edb5/attachment.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Mon, 27 Sep 2010 18:31:31 +0100
> From: fosiul alam <expertalert@xxxxxxxxx>
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Subject: Re:  ricci is very unstable in one nodes
> Message-ID:
> 
> <AANLkTikwtYxG3_gf0QxqJpGzZxowh4T7rGbwH-+MhWs8@xxxxxxxxxxxxxx<AANLkTikwtYxG3_gf0QxqJpGzZxowh4T7rGbwH-%2BMhWs8@xxxxxxxxxxxxxx>
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi
> Thanks for your advise,
> Currently i got this
>
> luci-0.12.2-12.el5.centos.1
> ricci-0.12.2-12.el5.centos.1
>
> is this the same rpm as
>
> luci-0.12.2-12.el5_5.4.i386.rpm  ?
> ricci-0.12.2-12.el5_5.4.i386.rpm  ?
>
> Thanks
>
>
> On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote:
>
> > http://rhn.redhat.com/errata/RHBA-2010-0716.html
> >
> > It appears that this problem has been fixed in this errata.
> >
> > I installed the luci and ricci updates and did some lite testing.   So
> far,
> > the timeout 11111 error has not shown up.
> >
> > Paul
> >
> > ----- Original Message -----
> > From: "fosiul alam" <expertalert@xxxxxxxxx>
> > To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> > Sent: Monday, September 27, 2010 10:48:27 AM
> > Subject: Re:  ricci is very unstable in one nodes
> >
> > Hi
> > i am trying to patch ricci . let see how it goes
> >
> > but clusvcadm is failing as well
> >
> > [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> > Member http1.xxxx.local trying to enable service:httpd1...Invalid
> > operation for resource
> >
> > here, http1 , where i was trying to run the service from luci
> >
> > what could be the problem ?
> > is there any way to find out if there is any problem with config ??
> >
> > On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:
> >
> >
> > RHEL 5.6 hasn't been released yet so your package probably contains the
> > problem. I'm not sure how in sync Centos is with RHEL or if they patch
> > earlier so I cannot give you a time frame when it will be in Centos or
> > if they have already patched it. The problem in that BZ is more of an
> > annoyance, you usually just have to retry a time or two and it works. If
> > you can't get Luci working properly with your service at all you should
> > try enabling the service through the command line with clusvcadm -e. If
> > it is not working from the command line either then there is a problem
> > with the service config.
> >
> >
> >
> >
> > -Ben
> >
> >
> >
> >
> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
> >
> > > Hi Ben
> > > Thanks
> > >
> > > I named this cluster as mysql-server but i have not installed mysql
> > > database in their yet
> > >
> > > and both luci and ricci on luci server and node1 is running this
> > > version
> > >
> > > luci-0.12.2-12.el5.centos.1
> > > ricci-0.12.2-12.el5.centos.1
> > >
> > >
> > > do you think this version has problem as well ??
> > >
> > > thanks for your help
> > >
> > >
> > >
> > >
> > > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:
> > >
> > >
> > > There is an issue with ricci timeouts that was fixed recently:
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=564490
> > >
> > > I'm not sure but you may be hitting that bug. Symptoms include: luci
> > > isn't able to get the status from the node, timeouts when querying
> > > ricci, etc. The fix should be released with 5.6
> > >
> > > On the mysql service there are some options that you need to set. Here
> > > are all the options available to that agent:
> > >
> > > mysql
> > > Defines a MySQL database server
> > >
> > > Attribute Description
> > > config_file Define configuration file
> > > listen_address Define an IP address for MySQL server. If the address
> > > is not given then first IP address from the service is taken.
> > > mysqld_options Other command-line options for mysqld
> > > name Name
> > > ref Reference to existing mysql resource in the resources section.
> > > service_name Inherit the service name.
> > > shutdown_wait Wait X seconds for correct end of service shutdown
> > > startup_wait Wait X seconds for correct end of service startup
> > > __enforce_timeouts Consider a timeout for operations as fatal.
> > > __failure_expire_time Amount of time before a failure is forgotten.
> > > __independent_subtree Treat this and all children as an independent
> > > subtree. __max_failures Maximum number of failures before returning a
> > > failure to a status check.
> > >
> > > If I recall correctly you may need to tweak:
> > >
> > > shutdown_wait Wait X seconds for correct end of service shutdown
> > > startup_wait Wait X seconds for correct end of service startup
> > >
> > > There can be problems relocating the DB if it takes too long to
> > > start/shutdown. If you are having problems relocating with luci it may
> > > be a good idea to test with:
> > >
> > > # clusvcadm -r <service name> -m <cluster node>
> > >
> > > -Ben
> > >
> > >
> > >
> > >
> > >
> > >
> > > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
> > >
> > > > Hi
> > > > I have 4 nodes cluster,
> > > > It was running fine. but today one nodes is giving trouble
> > > >
> > > > From luci Gui interface, when i try to relocate service into this
> > > node
> > > > and trying to relocate from this nodes to another nodes
> > > >
> > > > from luci gui interface, its showing :
> > > >
> > > > Unable to retrieve batch 1908047789 status from
> > > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
> > > > Starting cluster service "httpd1" on node "http1.domain.local" --
> > > You
> > > > will be redirected in 5 seconds.
> > > > also
> > > >
> > > > The ricci agent for this node is unresponsive. Node-specific
> > > > information is not available at this time. :
> > > >
> > > > but ricci is running on problematic node ,
> > > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
> > > >
> > > > there is not any firewall running.
> > > >
> > > > iptables -L
> > > > Chain INPUT (policy ACCEPT)
> > > > target prot opt source destination
> > > >
> > > > Chain FORWARD (policy ACCEPT)
> > > > target prot opt source destination
> > > >
> > > > Chain OUTPUT (policy ACCEPT)
> > > > target prot opt source destination
> > > >
> > > > Chain RH-Firewall-1-INPUT (0 references)
> > > > target prot opt source destination
> > > >
> > > > port 11111 is runningg
> > > >
> > > > netstat -an | grep 11111
> > > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
> > > >
> > > >
> > > > but still ricci is very unstable , and i cant relocate any service
> > > on
> > > > this node or i cant relocate any service away from this node.
> > > >
> > > > from problematic node if i type this
> > > >
> > > > clustat
> > > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
> > > > Member Status: Quorate
> > > >
> > > > Member Name ID Status
> > > > ------ ---- ---- ------
> > > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
> > > > server publicdns1.xxxx.local 2 Online, rgmanager
> > > > http1.xxxx.local 3 Online, Local, rgmanager
> > > > mail01.xxxxx.local 4 Online, rgmanager
> > > >
> > > > Service Name Owner (Last) State
> > > > ------- ---- ----- ------ -----
> > > > service:httpd1 mail01.xxxx.local started
> > > > service:mysql-server http1.xxxx.local started -------------------
> > > this
> > > > is the problematic node
> > > > service:public-dns publicdns1.xxxxxx.local started
> > > >
> > > > I cant move that service mysql-server from this node or cant
> > > relocate
> > > > any service on this node ..
> > > > I am very confused.
> > > >
> > > > what shall i do to fix this issue ??
> > > >
> > > > thanks for your advise.
> > > >
> > > >
> > > >
> > > >
> > > > -- Linux-cluster mailing list
> > > > Linux-cluster@xxxxxxxxxx
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > -- Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > > -- Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > -- Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > -- Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20100927/462f567b/attachment.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Mon, 27 Sep 2010 18:37:44 +0100
> From: fosiul alam <expertalert@xxxxxxxxx>
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Subject: Re:  ricci is very unstable in one nodes
> Message-ID:
>        <AANLkTi=DfrVMFkp8No9UbwD+fVoRx9FmpO+qzY2RxLPk@xxxxxxxxxxxxxx<DfrVMFkp8No9UbwD%2BfVoRx9FmpO%2BqzY2RxLPk@xxxxxxxxxxxxxx>
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi, Addition to my previous email have a look to this one
>
> from http1 ( where i am trying to relocate a service)
>
> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> Member http1.xxxx.local trying to enable service:httpd1...Success
> Warning: service:httpd1 is now running on mail01.xxxx.local
>
> so, its saying its Success..
> but it actually no..
>
> Thanks again
>
>
>
> On 27 September 2010 18:31, fosiul alam <expertalert@xxxxxxxxx> wrote:
>
> > Hi
> > Thanks for your advise,
> > Currently i got this
> >
> >
> > luci-0.12.2-12.el5.centos.1
> > ricci-0.12.2-12.el5.centos.1
> >
> > is this the same rpm as
> >
> > luci-0.12.2-12.el5_5.4.i386.rpm  ?
> > ricci-0.12.2-12.el5_5.4.i386.rpm  ?
> >
> > Thanks
> >
> >
> >
> > On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote:
> >
> >> http://rhn.redhat.com/errata/RHBA-2010-0716.html
> >>
> >> It appears that this problem has been fixed in this errata.
> >>
> >> I installed the luci and ricci updates and did some lite testing.   So
> >> far, the timeout 11111 error has not shown up.
> >>
> >> Paul
> >>
> >> ----- Original Message -----
> >> From: "fosiul alam" <expertalert@xxxxxxxxx>
> >> To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> >> Sent: Monday, September 27, 2010 10:48:27 AM
> >> Subject: Re:  ricci is very unstable in one nodes
> >>
> >> Hi
> >> i am trying to patch ricci . let see how it goes
> >>
> >> but clusvcadm is failing as well
> >>
> >> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> >> Member http1.xxxx.local trying to enable service:httpd1...Invalid
> >> operation for resource
> >>
> >> here, http1 , where i was trying to run the service from luci
> >>
> >> what could be the problem ?
> >> is there any way to find out if there is any problem with config ??
> >>
> >> On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:
> >>
> >>
> >> RHEL 5.6 hasn't been released yet so your package probably contains the
> >> problem. I'm not sure how in sync Centos is with RHEL or if they patch
> >> earlier so I cannot give you a time frame when it will be in Centos or
> >> if they have already patched it. The problem in that BZ is more of an
> >> annoyance, you usually just have to retry a time or two and it works. If
> >> you can't get Luci working properly with your service at all you should
> >> try enabling the service through the command line with clusvcadm -e. If
> >> it is not working from the command line either then there is a problem
> >> with the service config.
> >>
> >>
> >>
> >>
> >> -Ben
> >>
> >>
> >>
> >>
> >> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
> >>
> >> > Hi Ben
> >> > Thanks
> >> >
> >> > I named this cluster as mysql-server but i have not installed mysql
> >> > database in their yet
> >> >
> >> > and both luci and ricci on luci server and node1 is running this
> >> > version
> >> >
> >> > luci-0.12.2-12.el5.centos.1
> >> > ricci-0.12.2-12.el5.centos.1
> >> >
> >> >
> >> > do you think this version has problem as well ??
> >> >
> >> > thanks for your help
> >> >
> >> >
> >> >
> >> >
> >> > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:
> >> >
> >> >
> >> > There is an issue with ricci timeouts that was fixed recently:
> >> >
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=564490
> >> >
> >> > I'm not sure but you may be hitting that bug. Symptoms include: luci
> >> > isn't able to get the status from the node, timeouts when querying
> >> > ricci, etc. The fix should be released with 5.6
> >> >
> >> > On the mysql service there are some options that you need to set. Here
> >> > are all the options available to that agent:
> >> >
> >> > mysql
> >> > Defines a MySQL database server
> >> >
> >> > Attribute Description
> >> > config_file Define configuration file
> >> > listen_address Define an IP address for MySQL server. If the address
> >> > is not given then first IP address from the service is taken.
> >> > mysqld_options Other command-line options for mysqld
> >> > name Name
> >> > ref Reference to existing mysql resource in the resources section.
> >> > service_name Inherit the service name.
> >> > shutdown_wait Wait X seconds for correct end of service shutdown
> >> > startup_wait Wait X seconds for correct end of service startup
> >> > __enforce_timeouts Consider a timeout for operations as fatal.
> >> > __failure_expire_time Amount of time before a failure is forgotten.
> >> > __independent_subtree Treat this and all children as an independent
> >> > subtree. __max_failures Maximum number of failures before returning a
> >> > failure to a status check.
> >> >
> >> > If I recall correctly you may need to tweak:
> >> >
> >> > shutdown_wait Wait X seconds for correct end of service shutdown
> >> > startup_wait Wait X seconds for correct end of service startup
> >> >
> >> > There can be problems relocating the DB if it takes too long to
> >> > start/shutdown. If you are having problems relocating with luci it may
> >> > be a good idea to test with:
> >> >
> >> > # clusvcadm -r <service name> -m <cluster node>
> >> >
> >> > -Ben
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:
> >> >
> >> > > Hi
> >> > > I have 4 nodes cluster,
> >> > > It was running fine. but today one nodes is giving trouble
> >> > >
> >> > > From luci Gui interface, when i try to relocate service into this
> >> > node
> >> > > and trying to relocate from this nodes to another nodes
> >> > >
> >> > > from luci gui interface, its showing :
> >> > >
> >> > > Unable to retrieve batch 1908047789 status from
> >> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
> >> > > Starting cluster service "httpd1" on node "http1.domain.local" --
> >> > You
> >> > > will be redirected in 5 seconds.
> >> > > also
> >> > >
> >> > > The ricci agent for this node is unresponsive. Node-specific
> >> > > information is not available at this time. :
> >> > >
> >> > > but ricci is running on problematic node ,
> >> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
> >> > >
> >> > > there is not any firewall running.
> >> > >
> >> > > iptables -L
> >> > > Chain INPUT (policy ACCEPT)
> >> > > target prot opt source destination
> >> > >
> >> > > Chain FORWARD (policy ACCEPT)
> >> > > target prot opt source destination
> >> > >
> >> > > Chain OUTPUT (policy ACCEPT)
> >> > > target prot opt source destination
> >> > >
> >> > > Chain RH-Firewall-1-INPUT (0 references)
> >> > > target prot opt source destination
> >> > >
> >> > > port 11111 is runningg
> >> > >
> >> > > netstat -an | grep 11111
> >> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
> >> > >
> >> > >
> >> > > but still ricci is very unstable , and i cant relocate any service
> >> > on
> >> > > this node or i cant relocate any service away from this node.
> >> > >
> >> > > from problematic node if i type this
> >> > >
> >> > > clustat
> >> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
> >> > > Member Status: Quorate
> >> > >
> >> > > Member Name ID Status
> >> > > ------ ---- ---- ------
> >> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
> >> > > server publicdns1.xxxx.local 2 Online, rgmanager
> >> > > http1.xxxx.local 3 Online, Local, rgmanager
> >> > > mail01.xxxxx.local 4 Online, rgmanager
> >> > >
> >> > > Service Name Owner (Last) State
> >> > > ------- ---- ----- ------ -----
> >> > > service:httpd1 mail01.xxxx.local started
> >> > > service:mysql-server http1.xxxx.local started -------------------
> >> > this
> >> > > is the problematic node
> >> > > service:public-dns publicdns1.xxxxxx.local started
> >> > >
> >> > > I cant move that service mysql-server from this node or cant
> >> > relocate
> >> > > any service on this node ..
> >> > > I am very confused.
> >> > >
> >> > > what shall i do to fix this issue ??
> >> > >
> >> > > thanks for your advise.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > -- Linux-cluster mailing list
> >> > > Linux-cluster@xxxxxxxxxx
> >> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >> >
> >> > -- Linux-cluster mailing list
> >> > Linux-cluster@xxxxxxxxxx
> >> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >> >
> >> >
> >> > -- Linux-cluster mailing list
> >> > Linux-cluster@xxxxxxxxxx
> >> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> -- Linux-cluster mailing list
> >> Linux-cluster@xxxxxxxxxx
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> -- Linux-cluster mailing list
> >> Linux-cluster@xxxxxxxxxx
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster@xxxxxxxxxx
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20100927/4101fdf9/attachment.html
> >
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 77, Issue 23
> *********************************************
>

-- 
Santosh

--------------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster 

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster