cluster issue

santosh lohar <sslohar@xxxxxxxxx> · Tue, 28 Sep 2010 12:14:48 +0530

Hi all,
I am facing the problem with SGE and flexlm licencing details are below:

Hardware:  IBM 3650 , 2 Quad core CPU , 16 GB RAM , total nos of node2 + one master node conected with IB switch connectivity:
Software : ROCKS 5.1 / os -RHEL4 mars hill/ fluent / MSC mentat.

Problem :
1 when I submitt the jobs with SGE the "qhost -F MDAdv " is showinf updated status of license issued and avilable
but when I submitt the jobs outside SGE then it will not able to recognize the latest status of license tokens
2. jobs submitted after 4 cpu's then cluster computation will get slows down ,

Kindly suggest me what to do in this case , thanks in advance

Regards
Santosh

On Mon, Sep 27, 2010 at 11:07 PM,  <linux-cluster-request@xxxxxxxxxx> wrote:

Send Linux-cluster mailing list submissions to

        linux-cluster@xxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit

        https://www.redhat.com/mailman/listinfo/linux-cluster

or, via email, send a message with subject or body 'help' to

        linux-cluster-request@xxxxxxxxxx

You can reach the person managing the list at

        linux-cluster-owner@xxxxxxxxxx

When replying, please edit your Subject line so it is more specific

than "Re: Contents of Linux-cluster digest..."

Today's Topics:

   1. Unable to patch conga (fosiul alam)

   2. Re: ricci is very unstable in one nodes (Paul M. Dyer)

   3. Re: porblem with quorum at cluster boot (brem belguebli)

   4. Re: ricci is very unstable in one nodes (fosiul alam)

   5. Re: ricci is very unstable in one nodes (fosiul alam)

----------------------------------------------------------------------

Message: 1

Date: Mon, 27 Sep 2010 17:02:20 +0100

From: fosiul alam <expertalert@xxxxxxxxx>

To: linux clustering <linux-cluster@xxxxxxxxxx>

Subject:  Unable to patch conga

Message-ID:

        <AANLkTimdQNO3x3g5EKc2ETMPePf3iA-Cptiih6rLb4Au@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="iso-8859-1"

hi

Due to the same issue, I see exact same problem in my luci interface

so i am trying to patch conga.

I downloaded ,

http://mirrors.kernel.org/centos/5/os/SRPMS/conga-0.12.2-12.el5.centos.1.src.rpm

rpm -i conga-0.12.2-12.el5.centos.1.src.rpm

cd /usr/src/redhat/SOURCE

tar -xvzf conga-0.12.2.tar.gz

patch -p0 < /path/to/where_the_patch/ricci.patch

[root@beaver SOURCES]# cd conga-0.12.2

Now i am facing the problem to install

./autogen.sh --include_zope_and_plone=yes

Zope-2.9.8-final.tgz passed sha512sum test

Plone-2.5.5.tar.gz passed sha512sum test

cat: clustermon.spec.in.in: No such file or directory

Run `./configure` to configure conga build,

or `make srpms` to build conga and clustermon srpms

or `make rpms` to build all rpms

[root@beaver conga-0.12.2]#  ./configure --include_zope_and_plone=yes

D-BUS version 1.1.2 detected  -> major 1, minor 1

missing zope directory, extract zope source-code into it and try again

Now, how will i tell ./configure where is zope and plone ?

do i need this zope and plone ?

Please give me some advise

Fosiul

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <https://www.redhat.com/archives/linux-cluster/attachments/20100927/21959f19/attachment.html>

------------------------------

Message: 2

Date: Mon, 27 Sep 2010 11:55:28 -0500 (CDT)

From: "Paul M. Dyer" <pmdyer@xxxxxxxxxxxxxxx>

To: linux clustering <linux-cluster@xxxxxxxxxx>

Subject: Re:  ricci is very unstable in one nodes

Message-ID: <1480320.10.1285606528829.JavaMail.root@athena>

Content-Type: text/plain; charset=utf-8

http://rhn.redhat.com/errata/RHBA-2010-0716.html

It appears that this problem has been fixed in this errata.

I installed the luci and ricci updates and did some lite testing.   So far, the timeout 11111 error has not shown up.

Paul

----- Original Message -----

From: "fosiul alam" <expertalert@xxxxxxxxx>

To: "linux clustering" <linux-cluster@xxxxxxxxxx>

Sent: Monday, September 27, 2010 10:48:27 AM

Subject: Re:  ricci is very unstable in one nodes

Hi

i am trying to patch ricci . let see how it goes

but clusvcadm is failing as well

[root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local

Member http1.xxxx.local trying to enable service:httpd1...Invalid

operation for resource

here, http1 , where i was trying to run the service from luci

what could be the problem ?

is there any way to find out if there is any problem with config ??

On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:

RHEL 5.6 hasn't been released yet so your package probably contains the

problem. I'm not sure how in sync Centos is with RHEL or if they patch

earlier so I cannot give you a time frame when it will be in Centos or

if they have already patched it. The problem in that BZ is more of an

annoyance, you usually just have to retry a time or two and it works. If

you can't get Luci working properly with your service at all you should

try enabling the service through the command line with clusvcadm -e. If

it is not working from the command line either then there is a problem

with the service config.

-Ben

----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

> Hi Ben

> Thanks

>

> I named this cluster as mysql-server but i have not installed mysql

> database in their yet

>

> and both luci and ricci on luci server and node1 is running this

> version

>

> luci-0.12.2-12.el5.centos.1

> ricci-0.12.2-12.el5.centos.1

>

>

> do you think this version has problem as well ??

>

> thanks for your help

>

>

>

>

> On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:

>

>

> There is an issue with ricci timeouts that was fixed recently:

>

> https://bugzilla.redhat.com/show_bug.cgi?id=564490

>

> I'm not sure but you may be hitting that bug. Symptoms include: luci

> isn't able to get the status from the node, timeouts when querying

> ricci, etc. The fix should be released with 5.6

>

> On the mysql service there are some options that you need to set. Here

> are all the options available to that agent:

>

> mysql

> Defines a MySQL database server

>

> Attribute Description

> config_file Define configuration file

> listen_address Define an IP address for MySQL server. If the address

> is not given then first IP address from the service is taken.

> mysqld_options Other command-line options for mysqld

> name Name

> ref Reference to existing mysql resource in the resources section.

> service_name Inherit the service name.

> shutdown_wait Wait X seconds for correct end of service shutdown

> startup_wait Wait X seconds for correct end of service startup

> __enforce_timeouts Consider a timeout for operations as fatal.

> __failure_expire_time Amount of time before a failure is forgotten.

> __independent_subtree Treat this and all children as an independent

> subtree. __max_failures Maximum number of failures before returning a

> failure to a status check.

>

> If I recall correctly you may need to tweak:

>

> shutdown_wait Wait X seconds for correct end of service shutdown

> startup_wait Wait X seconds for correct end of service startup

>

> There can be problems relocating the DB if it takes too long to

> start/shutdown. If you are having problems relocating with luci it may

> be a good idea to test with:

>

> # clusvcadm -r <service name> -m <cluster node>

>

> -Ben

>

>

>

>

>

>

> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

>

> > Hi

> > I have 4 nodes cluster,

> > It was running fine. but today one nodes is giving trouble

> >

> > From luci Gui interface, when i try to relocate service into this

> node

> > and trying to relocate from this nodes to another nodes

> >

> > from luci gui interface, its showing :

> >

> > Unable to retrieve batch 1908047789 status from

> > beaver.domain.local:11111: clusvcadm start failed to start httpd1:

> > Starting cluster service "httpd1" on node "http1.domain.local" --

> You

> > will be redirected in 5 seconds.

> > also

> >

> > The ricci agent for this node is unresponsive. Node-specific

> > information is not available at this time. :

> >

> > but ricci is running on problematic node ,

> > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101

> >

> > there is not any firewall running.

> >

> > iptables -L

> > Chain INPUT (policy ACCEPT)

> > target prot opt source destination

> >

> > Chain FORWARD (policy ACCEPT)

> > target prot opt source destination

> >

> > Chain OUTPUT (policy ACCEPT)

> > target prot opt source destination

> >

> > Chain RH-Firewall-1-INPUT (0 references)

> > target prot opt source destination

> >

> > port 11111 is runningg

> >

> > netstat -an | grep 11111

> > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN

> >

> >

> > but still ricci is very unstable , and i cant relocate any service

> on

> > this node or i cant relocate any service away from this node.

> >

> > from problematic node if i type this

> >

> > clustat

> > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010

> > Member Status: Quorate

> >

> > Member Name ID Status

> > ------ ---- ---- ------

> > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this

> > server publicdns1.xxxx.local 2 Online, rgmanager

> > http1.xxxx.local 3 Online, Local, rgmanager

> > mail01.xxxxx.local 4 Online, rgmanager

> >

> > Service Name Owner (Last) State

> > ------- ---- ----- ------ -----

> > service:httpd1 mail01.xxxx.local started

> > service:mysql-server http1.xxxx.local started -------------------

> this

> > is the problematic node

> > service:public-dns publicdns1.xxxxxx.local started

> >

> > I cant move that service mysql-server from this node or cant

> relocate

> > any service on this node ..

> > I am very confused.

> >

> > what shall i do to fix this issue ??

> >

> > thanks for your advise.

> >

> >

> >

> >

> > -- Linux-cluster mailing list

> > Linux-cluster@xxxxxxxxxx

> > https://www.redhat.com/mailman/listinfo/linux-cluster

>

> -- Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

>

> -- Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

-- Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------

Message: 3

Date: Mon, 27 Sep 2010 19:05:06 +0200

From: brem belguebli <brem.belguebli@xxxxxxxxx>

To: linux clustering <linux-cluster@xxxxxxxxxx>

Subject: Re:  porblem with quorum at cluster boot

Message-ID:

        <AANLkTi=FOA-cj5hg11zBmZdzWyQiMpPCM9FZiKgFQHH9@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="iso-8859-1"

The configuration you are trying to build, 2 cluster nodes (1 vote each)

plus a quorum disk 1 vote (making a total expected votes= 3) must remain up

if you loose 1 of the members (as long as the remaining node still accesses

the quorum disk) because there are still 2   active votes (1 remaining node

+ 1 quorum disk) = 2 > expected_votes/2.

The Quorum (majority) must be greater (absolutely greater  >) than

expected_votes/2 (51% or greater) in order to service to continue.

2010/9/27 Bennie R Thomas <Bennie_R_Thomas@xxxxxxxxxxxx>

> Try setting your expected votes to 2 or 1..

>

> Your Cluster is hanging with one node because it want's 3 votes.

>

>

>

>   From: Brem Belguebli <brem.belguebli@xxxxxxxxx> To: linux clustering <

> linux-cluster@xxxxxxxxxx> Date: 09/25/2010 10:30 AM Subject: Re:

>  porblem with quorum at cluster boot Sent by:

> linux-cluster-bounces@xxxxxxxxxx

> ------------------------------

>

>

>

> On Fri, 2010-09-24 at 12:52 -0400, Jason_Henderson@xxxxxxxxx wrote:

> >

> > I think you still need two_node="1" in your conf file if you want a

> > single node to become quorate.

> >

> two_nodes=1 is only valid if you do not have a quorum disk.

>

> > linux-cluster-bounces@xxxxxxxxxx wrote on 09/24/2010 12:38:17 PM:

> >

> > > hello,

> > >

> > > I have a 2 node cluster with qdisk quorum partition;

> > >

> > > each node has 1 vote and the qdisk has 1 vote too; in cluster.conf

> > I

> > > have this explicit declaration:

> > > <cman expected_votes="3" two_node="0"\>

> > >

> > > when I have both 2 nodes active cman_tool status tell me this:

> > >

> > > Version: 6.1.0

> > > Nodes: 2

> > > Expected votes: 3

> > > Quorum device votes: 1

> > > Total votes: 3

> > > Node votes: 1

> > > Quorum: 2

> > >

> > > then, if I power off a node these value, as expected, changed this

> > way:

> > > Nodes: 1

> > > Total votes: 2

> > >

> > > and the cluster is still quorate and functional.

> > >

> > > the problem is if I power off both the node and them power on only

> > one

> > > of them: in this case the single node does not quorate and the

> > cluster

> > > does not start: I have to power on both the node to have the

> > cluster

> > > (and services on the cluster) working.

> > >

> > > I'd like the cluster can work (and boot) even with a single node

> > (ie, if

> > > one of the node has hw failure and is down I still want to be able

> > to

> > > reboot the working node and have it booting correctly the cluster)

> > >

> > > any hints? (thank's for reading all this)

> > >

> > > --

> > > bye,

> > > emilio

> > >

> > > --

> > > Linux-cluster mailing list

> > > Linux-cluster@xxxxxxxxxx

> > > https://www.redhat.com/mailman/listinfo/linux-cluster

> > --

> > Linux-cluster mailing list

> > Linux-cluster@xxxxxxxxxx

> > https://www.redhat.com/mailman/listinfo/linux-cluster

>

>

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

>

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <https://www.redhat.com/archives/linux-cluster/attachments/20100927/e452edb5/attachment.html>

------------------------------

Message: 4

Date: Mon, 27 Sep 2010 18:31:31 +0100

From: fosiul alam <expertalert@xxxxxxxxx>

To: linux clustering <linux-cluster@xxxxxxxxxx>

Subject: Re:  ricci is very unstable in one nodes

Message-ID:

        <AANLkTikwtYxG3_gf0QxqJpGzZxowh4T7rGbwH-+MhWs8@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="iso-8859-1"

Hi

Thanks for your advise,

Currently i got this

luci-0.12.2-12.el5.centos.1

ricci-0.12.2-12.el5.centos.1

is this the same rpm as

luci-0.12.2-12.el5_5.4.i386.rpm  ?

ricci-0.12.2-12.el5_5.4.i386.rpm  ?

Thanks

On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote:

> http://rhn.redhat.com/errata/RHBA-2010-0716.html

>

> It appears that this problem has been fixed in this errata.

>

> I installed the luci and ricci updates and did some lite testing.   So far,

> the timeout 11111 error has not shown up.

>

> Paul

>

> ----- Original Message -----

> From: "fosiul alam" <expertalert@xxxxxxxxx>

> To: "linux clustering" <linux-cluster@xxxxxxxxxx>

> Sent: Monday, September 27, 2010 10:48:27 AM

> Subject: Re:  ricci is very unstable in one nodes

>

> Hi

> i am trying to patch ricci . let see how it goes

>

> but clusvcadm is failing as well

>

> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local

> Member http1.xxxx.local trying to enable service:httpd1...Invalid

> operation for resource

>

> here, http1 , where i was trying to run the service from luci

>

> what could be the problem ?

> is there any way to find out if there is any problem with config ??

>

> On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:

>

>

> RHEL 5.6 hasn't been released yet so your package probably contains the

> problem. I'm not sure how in sync Centos is with RHEL or if they patch

> earlier so I cannot give you a time frame when it will be in Centos or

> if they have already patched it. The problem in that BZ is more of an

> annoyance, you usually just have to retry a time or two and it works. If

> you can't get Luci working properly with your service at all you should

> try enabling the service through the command line with clusvcadm -e. If

> it is not working from the command line either then there is a problem

> with the service config.

>

>

>

>

> -Ben

>

>

>

>

> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

>

> > Hi Ben

> > Thanks

> >

> > I named this cluster as mysql-server but i have not installed mysql

> > database in their yet

> >

> > and both luci and ricci on luci server and node1 is running this

> > version

> >

> > luci-0.12.2-12.el5.centos.1

> > ricci-0.12.2-12.el5.centos.1

> >

> >

> > do you think this version has problem as well ??

> >

> > thanks for your help

> >

> >

> >

> >

> > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:

> >

> >

> > There is an issue with ricci timeouts that was fixed recently:

> >

> > https://bugzilla.redhat.com/show_bug.cgi?id=564490

> >

> > I'm not sure but you may be hitting that bug. Symptoms include: luci

> > isn't able to get the status from the node, timeouts when querying

> > ricci, etc. The fix should be released with 5.6

> >

> > On the mysql service there are some options that you need to set. Here

> > are all the options available to that agent:

> >

> > mysql

> > Defines a MySQL database server

> >

> > Attribute Description

> > config_file Define configuration file

> > listen_address Define an IP address for MySQL server. If the address

> > is not given then first IP address from the service is taken.

> > mysqld_options Other command-line options for mysqld

> > name Name

> > ref Reference to existing mysql resource in the resources section.

> > service_name Inherit the service name.

> > shutdown_wait Wait X seconds for correct end of service shutdown

> > startup_wait Wait X seconds for correct end of service startup

> > __enforce_timeouts Consider a timeout for operations as fatal.

> > __failure_expire_time Amount of time before a failure is forgotten.

> > __independent_subtree Treat this and all children as an independent

> > subtree. __max_failures Maximum number of failures before returning a

> > failure to a status check.

> >

> > If I recall correctly you may need to tweak:

> >

> > shutdown_wait Wait X seconds for correct end of service shutdown

> > startup_wait Wait X seconds for correct end of service startup

> >

> > There can be problems relocating the DB if it takes too long to

> > start/shutdown. If you are having problems relocating with luci it may

> > be a good idea to test with:

> >

> > # clusvcadm -r <service name> -m <cluster node>

> >

> > -Ben

> >

> >

> >

> >

> >

> >

> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

> >

> > > Hi

> > > I have 4 nodes cluster,

> > > It was running fine. but today one nodes is giving trouble

> > >

> > > From luci Gui interface, when i try to relocate service into this

> > node

> > > and trying to relocate from this nodes to another nodes

> > >

> > > from luci gui interface, its showing :

> > >

> > > Unable to retrieve batch 1908047789 status from

> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:

> > > Starting cluster service "httpd1" on node "http1.domain.local" --

> > You

> > > will be redirected in 5 seconds.

> > > also

> > >

> > > The ricci agent for this node is unresponsive. Node-specific

> > > information is not available at this time. :

> > >

> > > but ricci is running on problematic node ,

> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101

> > >

> > > there is not any firewall running.

> > >

> > > iptables -L

> > > Chain INPUT (policy ACCEPT)

> > > target prot opt source destination

> > >

> > > Chain FORWARD (policy ACCEPT)

> > > target prot opt source destination

> > >

> > > Chain OUTPUT (policy ACCEPT)

> > > target prot opt source destination

> > >

> > > Chain RH-Firewall-1-INPUT (0 references)

> > > target prot opt source destination

> > >

> > > port 11111 is runningg

> > >

> > > netstat -an | grep 11111

> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN

> > >

> > >

> > > but still ricci is very unstable , and i cant relocate any service

> > on

> > > this node or i cant relocate any service away from this node.

> > >

> > > from problematic node if i type this

> > >

> > > clustat

> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010

> > > Member Status: Quorate

> > >

> > > Member Name ID Status

> > > ------ ---- ---- ------

> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this

> > > server publicdns1.xxxx.local 2 Online, rgmanager

> > > http1.xxxx.local 3 Online, Local, rgmanager

> > > mail01.xxxxx.local 4 Online, rgmanager

> > >

> > > Service Name Owner (Last) State

> > > ------- ---- ----- ------ -----

> > > service:httpd1 mail01.xxxx.local started

> > > service:mysql-server http1.xxxx.local started -------------------

> > this

> > > is the problematic node

> > > service:public-dns publicdns1.xxxxxx.local started

> > >

> > > I cant move that service mysql-server from this node or cant

> > relocate

> > > any service on this node ..

> > > I am very confused.

> > >

> > > what shall i do to fix this issue ??

> > >

> > > thanks for your advise.

> > >

> > >

> > >

> > >

> > > -- Linux-cluster mailing list

> > > Linux-cluster@xxxxxxxxxx

> > > https://www.redhat.com/mailman/listinfo/linux-cluster

> >

> > -- Linux-cluster mailing list

> > Linux-cluster@xxxxxxxxxx

> > https://www.redhat.com/mailman/listinfo/linux-cluster

> >

> >

> > -- Linux-cluster mailing list

> > Linux-cluster@xxxxxxxxxx

> > https://www.redhat.com/mailman/listinfo/linux-cluster

>

> -- Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

>

> -- Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <https://www.redhat.com/archives/linux-cluster/attachments/20100927/462f567b/attachment.html>

------------------------------

Message: 5

Date: Mon, 27 Sep 2010 18:37:44 +0100

From: fosiul alam <expertalert@xxxxxxxxx>

To: linux clustering <linux-cluster@xxxxxxxxxx>

Subject: Re:  ricci is very unstable in one nodes

Message-ID:

        <AANLkTi=DfrVMFkp8No9UbwD+fVoRx9FmpO+qzY2RxLPk@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="iso-8859-1"

Hi, Addition to my previous email have a look to this one

from http1 ( where i am trying to relocate a service)

[root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local

Member http1.xxxx.local trying to enable service:httpd1...Success

Warning: service:httpd1 is now running on mail01.xxxx.local

so, its saying its Success..

but it actually no..

Thanks again

On 27 September 2010 18:31, fosiul alam <expertalert@xxxxxxxxx> wrote:

> Hi

> Thanks for your advise,

> Currently i got this

>

>

> luci-0.12.2-12.el5.centos.1

> ricci-0.12.2-12.el5.centos.1

>

> is this the same rpm as

>

> luci-0.12.2-12.el5_5.4.i386.rpm  ?

> ricci-0.12.2-12.el5_5.4.i386.rpm  ?

>

> Thanks

>

>

>

> On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote:

>

>> http://rhn.redhat.com/errata/RHBA-2010-0716.html

>>

>> It appears that this problem has been fixed in this errata.

>>

>> I installed the luci and ricci updates and did some lite testing.   So

>> far, the timeout 11111 error has not shown up.

>>

>> Paul

>>

>> ----- Original Message -----

>> From: "fosiul alam" <expertalert@xxxxxxxxx>

>> To: "linux clustering" <linux-cluster@xxxxxxxxxx>

>> Sent: Monday, September 27, 2010 10:48:27 AM

>> Subject: Re:  ricci is very unstable in one nodes

>>

>> Hi

>> i am trying to patch ricci . let see how it goes

>>

>> but clusvcadm is failing as well

>>

>> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local

>> Member http1.xxxx.local trying to enable service:httpd1...Invalid

>> operation for resource

>>

>> here, http1 , where i was trying to run the service from luci

>>

>> what could be the problem ?

>> is there any way to find out if there is any problem with config ??

>>

>> On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote:

>>

>>

>> RHEL 5.6 hasn't been released yet so your package probably contains the

>> problem. I'm not sure how in sync Centos is with RHEL or if they patch

>> earlier so I cannot give you a time frame when it will be in Centos or

>> if they have already patched it. The problem in that BZ is more of an

>> annoyance, you usually just have to retry a time or two and it works. If

>> you can't get Luci working properly with your service at all you should

>> try enabling the service through the command line with clusvcadm -e. If

>> it is not working from the command line either then there is a problem

>> with the service config.

>>

>>

>>

>>

>> -Ben

>>

>>

>>

>>

>> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

>>

>> > Hi Ben

>> > Thanks

>> >

>> > I named this cluster as mysql-server but i have not installed mysql

>> > database in their yet

>> >

>> > and both luci and ricci on luci server and node1 is running this

>> > version

>> >

>> > luci-0.12.2-12.el5.centos.1

>> > ricci-0.12.2-12.el5.centos.1

>> >

>> >

>> > do you think this version has problem as well ??

>> >

>> > thanks for your help

>> >

>> >

>> >

>> >

>> > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote:

>> >

>> >

>> > There is an issue with ricci timeouts that was fixed recently:

>> >

>> > https://bugzilla.redhat.com/show_bug.cgi?id=564490

>> >

>> > I'm not sure but you may be hitting that bug. Symptoms include: luci

>> > isn't able to get the status from the node, timeouts when querying

>> > ricci, etc. The fix should be released with 5.6

>> >

>> > On the mysql service there are some options that you need to set. Here

>> > are all the options available to that agent:

>> >

>> > mysql

>> > Defines a MySQL database server

>> >

>> > Attribute Description

>> > config_file Define configuration file

>> > listen_address Define an IP address for MySQL server. If the address

>> > is not given then first IP address from the service is taken.

>> > mysqld_options Other command-line options for mysqld

>> > name Name

>> > ref Reference to existing mysql resource in the resources section.

>> > service_name Inherit the service name.

>> > shutdown_wait Wait X seconds for correct end of service shutdown

>> > startup_wait Wait X seconds for correct end of service startup

>> > __enforce_timeouts Consider a timeout for operations as fatal.

>> > __failure_expire_time Amount of time before a failure is forgotten.

>> > __independent_subtree Treat this and all children as an independent

>> > subtree. __max_failures Maximum number of failures before returning a

>> > failure to a status check.

>> >

>> > If I recall correctly you may need to tweak:

>> >

>> > shutdown_wait Wait X seconds for correct end of service shutdown

>> > startup_wait Wait X seconds for correct end of service startup

>> >

>> > There can be problems relocating the DB if it takes too long to

>> > start/shutdown. If you are having problems relocating with luci it may

>> > be a good idea to test with:

>> >

>> > # clusvcadm -r <service name> -m <cluster node>

>> >

>> > -Ben

>> >

>> >

>> >

>> >

>> >

>> >

>> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote:

>> >

>> > > Hi

>> > > I have 4 nodes cluster,

>> > > It was running fine. but today one nodes is giving trouble

>> > >

>> > > From luci Gui interface, when i try to relocate service into this

>> > node

>> > > and trying to relocate from this nodes to another nodes

>> > >

>> > > from luci gui interface, its showing :

>> > >

>> > > Unable to retrieve batch 1908047789 status from

>> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:

>> > > Starting cluster service "httpd1" on node "http1.domain.local" --

>> > You

>> > > will be redirected in 5 seconds.

>> > > also

>> > >

>> > > The ricci agent for this node is unresponsive. Node-specific

>> > > information is not available at this time. :

>> > >

>> > > but ricci is running on problematic node ,

>> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101

>> > >

>> > > there is not any firewall running.

>> > >

>> > > iptables -L

>> > > Chain INPUT (policy ACCEPT)

>> > > target prot opt source destination

>> > >

>> > > Chain FORWARD (policy ACCEPT)

>> > > target prot opt source destination

>> > >

>> > > Chain OUTPUT (policy ACCEPT)

>> > > target prot opt source destination

>> > >

>> > > Chain RH-Firewall-1-INPUT (0 references)

>> > > target prot opt source destination

>> > >

>> > > port 11111 is runningg

>> > >

>> > > netstat -an | grep 11111

>> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN

>> > >

>> > >

>> > > but still ricci is very unstable , and i cant relocate any service

>> > on

>> > > this node or i cant relocate any service away from this node.

>> > >

>> > > from problematic node if i type this

>> > >

>> > > clustat

>> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010

>> > > Member Status: Quorate

>> > >

>> > > Member Name ID Status

>> > > ------ ---- ---- ------

>> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this

>> > > server publicdns1.xxxx.local 2 Online, rgmanager

>> > > http1.xxxx.local 3 Online, Local, rgmanager

>> > > mail01.xxxxx.local 4 Online, rgmanager

>> > >

>> > > Service Name Owner (Last) State

>> > > ------- ---- ----- ------ -----

>> > > service:httpd1 mail01.xxxx.local started

>> > > service:mysql-server http1.xxxx.local started -------------------

>> > this

>> > > is the problematic node

>> > > service:public-dns publicdns1.xxxxxx.local started

>> > >

>> > > I cant move that service mysql-server from this node or cant

>> > relocate

>> > > any service on this node ..

>> > > I am very confused.

>> > >

>> > > what shall i do to fix this issue ??

>> > >

>> > > thanks for your advise.

>> > >

>> > >

>> > >

>> > >

>> > > -- Linux-cluster mailing list

>> > > Linux-cluster@xxxxxxxxxx

>> > > https://www.redhat.com/mailman/listinfo/linux-cluster

>> >

>> > -- Linux-cluster mailing list

>> > Linux-cluster@xxxxxxxxxx

>> > https://www.redhat.com/mailman/listinfo/linux-cluster

>> >

>> >

>> > -- Linux-cluster mailing list

>> > Linux-cluster@xxxxxxxxxx

>> > https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>> -- Linux-cluster mailing list

>> Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>>

>> -- Linux-cluster mailing list

>> Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>> --

>> Linux-cluster mailing list

>> Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>

>

-------------- next part --------------

An HTML attachment was scrubbed...

URL: <https://www.redhat.com/archives/linux-cluster/attachments/20100927/4101fdf9/attachment.html>

------------------------------

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 77, Issue 23

*********************************************

-- 
Santosh 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster