----- Original Message ----- From: "santosh lohar" <sslohar@xxxxxxxxx> To: <linux-cluster@xxxxxxxxxx> Sent: Tuesday, September 28, 2010 2:44 PM Subject: cluster issue Hi all, I am facing the problem with SGE and flexlm licencing details are below: *Hardware: * IBM 3650 , 2 Quad core CPU , 16 GB RAM , total nos of node2 + one master node conected with IB switch connectivity: *Software* : ROCKS 5.1 / os -RHEL4 mars hill/ fluent / MSC mentat. Problem : 1 when I submitt the jobs with SGE the "qhost -F MDAdv " is showinf updated status of license issued and avilable but when I submitt the jobs outside SGE then it will not able to recognize the latest status of license tokens 2. jobs submitted after 4 cpu's then cluster computation will get slows down , Kindly suggest me what to do in this case , thanks in advance Regards Santosh On Mon, Sep 27, 2010 at 11:07 PM, <linux-cluster-request@xxxxxxxxxx> wrote: > Send Linux-cluster mailing list submissions to > linux-cluster@xxxxxxxxxx > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request@xxxxxxxxxx > > You can reach the person managing the list at > linux-cluster-owner@xxxxxxxxxx > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Unable to patch conga (fosiul alam) > 2. Re: ricci is very unstable in one nodes (Paul M. Dyer) > 3. Re: porblem with quorum at cluster boot (brem belguebli) > 4. Re: ricci is very unstable in one nodes (fosiul alam) > 5. Re: ricci is very unstable in one nodes (fosiul alam) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 27 Sep 2010 17:02:20 +0100 > From: fosiul alam <expertalert@xxxxxxxxx> > To: linux clustering <linux-cluster@xxxxxxxxxx> > Subject: Unable to patch conga > Message-ID: > <AANLkTimdQNO3x3g5EKc2ETMPePf3iA-Cptiih6rLb4Au@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset="iso-8859-1" > > hi > Due to the same issue, I see exact same problem in my luci interface > so i am trying to patch conga. > > I downloaded , > > > http://mirrors.kernel.org/centos/5/os/SRPMS/conga-0.12.2-12.el5.centos.1.src.rpm > rpm -i conga-0.12.2-12.el5.centos.1.src.rpm > cd /usr/src/redhat/SOURCE > > tar -xvzf conga-0.12.2.tar.gz > patch -p0 < /path/to/where_the_patch/ricci.patch > > [root@beaver SOURCES]# cd conga-0.12.2 > > Now i am facing the problem to install > > ./autogen.sh --include_zope_and_plone=yes > Zope-2.9.8-final.tgz passed sha512sum test > Plone-2.5.5.tar.gz passed sha512sum test > cat: clustermon.spec.in.in: No such file or directory > > Run `./configure` to configure conga build, > or `make srpms` to build conga and clustermon srpms > or `make rpms` to build all rpms > > [root@beaver conga-0.12.2]# ./configure --include_zope_and_plone=yes > D-BUS version 1.1.2 detected -> major 1, minor 1 > missing zope directory, extract zope source-code into it and try again > > > Now, how will i tell ./configure where is zope and plone ? > do i need this zope and plone ? > > Please give me some advise > > Fosiul > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20100927/21959f19/attachment.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Sep 2010 11:55:28 -0500 (CDT) > From: "Paul M. Dyer" <pmdyer@xxxxxxxxxxxxxxx> > To: linux clustering <linux-cluster@xxxxxxxxxx> > Subject: Re: ricci is very unstable in one nodes > Message-ID: <1480320.10.1285606528829.JavaMail.root@athena> > Content-Type: text/plain; charset=utf-8 > > http://rhn.redhat.com/errata/RHBA-2010-0716.html > > It appears that this problem has been fixed in this errata. > > I installed the luci and ricci updates and did some lite testing. So far, > the timeout 11111 error has not shown up. > > Paul > > ----- Original Message ----- > From: "fosiul alam" <expertalert@xxxxxxxxx> > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > Sent: Monday, September 27, 2010 10:48:27 AM > Subject: Re: ricci is very unstable in one nodes > > Hi > i am trying to patch ricci . let see how it goes > > but clusvcadm is failing as well > > [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local > Member http1.xxxx.local trying to enable service:httpd1...Invalid > operation for resource > > here, http1 , where i was trying to run the service from luci > > what could be the problem ? > is there any way to find out if there is any problem with config ?? > > On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote: > > > RHEL 5.6 hasn't been released yet so your package probably contains the > problem. I'm not sure how in sync Centos is with RHEL or if they patch > earlier so I cannot give you a time frame when it will be in Centos or > if they have already patched it. The problem in that BZ is more of an > annoyance, you usually just have to retry a time or two and it works. If > you can't get Luci working properly with your service at all you should > try enabling the service through the command line with clusvcadm -e. If > it is not working from the command line either then there is a problem > with the service config. > > > > > -Ben > > > > > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > > > Hi Ben > > Thanks > > > > I named this cluster as mysql-server but i have not installed mysql > > database in their yet > > > > and both luci and ricci on luci server and node1 is running this > > version > > > > luci-0.12.2-12.el5.centos.1 > > ricci-0.12.2-12.el5.centos.1 > > > > > > do you think this version has problem as well ?? > > > > thanks for your help > > > > > > > > > > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote: > > > > > > There is an issue with ricci timeouts that was fixed recently: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=564490 > > > > I'm not sure but you may be hitting that bug. Symptoms include: luci > > isn't able to get the status from the node, timeouts when querying > > ricci, etc. The fix should be released with 5.6 > > > > On the mysql service there are some options that you need to set. Here > > are all the options available to that agent: > > > > mysql > > Defines a MySQL database server > > > > Attribute Description > > config_file Define configuration file > > listen_address Define an IP address for MySQL server. If the address > > is not given then first IP address from the service is taken. > > mysqld_options Other command-line options for mysqld > > name Name > > ref Reference to existing mysql resource in the resources section. > > service_name Inherit the service name. > > shutdown_wait Wait X seconds for correct end of service shutdown > > startup_wait Wait X seconds for correct end of service startup > > __enforce_timeouts Consider a timeout for operations as fatal. > > __failure_expire_time Amount of time before a failure is forgotten. > > __independent_subtree Treat this and all children as an independent > > subtree. __max_failures Maximum number of failures before returning a > > failure to a status check. > > > > If I recall correctly you may need to tweak: > > > > shutdown_wait Wait X seconds for correct end of service shutdown > > startup_wait Wait X seconds for correct end of service startup > > > > There can be problems relocating the DB if it takes too long to > > start/shutdown. If you are having problems relocating with luci it may > > be a good idea to test with: > > > > # clusvcadm -r <service name> -m <cluster node> > > > > -Ben > > > > > > > > > > > > > > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > > > > > Hi > > > I have 4 nodes cluster, > > > It was running fine. but today one nodes is giving trouble > > > > > > From luci Gui interface, when i try to relocate service into this > > node > > > and trying to relocate from this nodes to another nodes > > > > > > from luci gui interface, its showing : > > > > > > Unable to retrieve batch 1908047789 status from > > > beaver.domain.local:11111: clusvcadm start failed to start httpd1: > > > Starting cluster service "httpd1" on node "http1.domain.local" -- > > You > > > will be redirected in 5 seconds. > > > also > > > > > > The ricci agent for this node is unresponsive. Node-specific > > > information is not available at this time. : > > > > > > but ricci is running on problematic node , > > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101 > > > > > > there is not any firewall running. > > > > > > iptables -L > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain RH-Firewall-1-INPUT (0 references) > > > target prot opt source destination > > > > > > port 11111 is runningg > > > > > > netstat -an | grep 11111 > > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN > > > > > > > > > but still ricci is very unstable , and i cant relocate any service > > on > > > this node or i cant relocate any service away from this node. > > > > > > from problematic node if i type this > > > > > > clustat > > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010 > > > Member Status: Quorate > > > > > > Member Name ID Status > > > ------ ---- ---- ------ > > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this > > > server publicdns1.xxxx.local 2 Online, rgmanager > > > http1.xxxx.local 3 Online, Local, rgmanager > > > mail01.xxxxx.local 4 Online, rgmanager > > > > > > Service Name Owner (Last) State > > > ------- ---- ----- ------ ----- > > > service:httpd1 mail01.xxxx.local started > > > service:mysql-server http1.xxxx.local started ------------------- > > this > > > is the problematic node > > > service:public-dns publicdns1.xxxxxx.local started > > > > > > I cant move that service mysql-server from this node or cant > > relocate > > > any service on this node .. > > > I am very confused. > > > > > > what shall i do to fix this issue ?? > > > > > > thanks for your advise. > > > > > > > > > > > > > > > -- Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > ------------------------------ > > Message: 3 > Date: Mon, 27 Sep 2010 19:05:06 +0200 > From: brem belguebli <brem.belguebli@xxxxxxxxx> > To: linux clustering <linux-cluster@xxxxxxxxxx> > Subject: Re: porblem with quorum at cluster boot > Message-ID: > <AANLkTi=FOA-cj5hg11zBmZdzWyQiMpPCM9FZiKgFQHH9@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset="iso-8859-1" > > The configuration you are trying to build, 2 cluster nodes (1 vote each) > plus a quorum disk 1 vote (making a total expected votes= 3) must remain up > if you loose 1 of the members (as long as the remaining node still accesses > the quorum disk) because there are still 2 active votes (1 remaining node > + 1 quorum disk) = 2 > expected_votes/2. > > The Quorum (majority) must be greater (absolutely greater >) than > expected_votes/2 (51% or greater) in order to service to continue. > > > 2010/9/27 Bennie R Thomas <Bennie_R_Thomas@xxxxxxxxxxxx> > > > Try setting your expected votes to 2 or 1.. > > > > Your Cluster is hanging with one node because it want's 3 votes. > > > > > > > > From: Brem Belguebli <brem.belguebli@xxxxxxxxx> To: linux clustering < > > linux-cluster@xxxxxxxxxx> Date: 09/25/2010 10:30 AM Subject: Re: > > porblem with quorum at cluster boot Sent by: > > linux-cluster-bounces@xxxxxxxxxx > > ------------------------------ > > > > > > > > On Fri, 2010-09-24 at 12:52 -0400, Jason_Henderson@xxxxxxxxx wrote: > > > > > > I think you still need two_node="1" in your conf file if you want a > > > single node to become quorate. > > > > > two_nodes=1 is only valid if you do not have a quorum disk. > > > > > linux-cluster-bounces@xxxxxxxxxx wrote on 09/24/2010 12:38:17 PM: > > > > > > > hello, > > > > > > > > I have a 2 node cluster with qdisk quorum partition; > > > > > > > > each node has 1 vote and the qdisk has 1 vote too; in cluster.conf > > > I > > > > have this explicit declaration: > > > > <cman expected_votes="3" two_node="0"\> > > > > > > > > when I have both 2 nodes active cman_tool status tell me this: > > > > > > > > Version: 6.1.0 > > > > Nodes: 2 > > > > Expected votes: 3 > > > > Quorum device votes: 1 > > > > Total votes: 3 > > > > Node votes: 1 > > > > Quorum: 2 > > > > > > > > then, if I power off a node these value, as expected, changed this > > > way: > > > > Nodes: 1 > > > > Total votes: 2 > > > > > > > > and the cluster is still quorate and functional. > > > > > > > > the problem is if I power off both the node and them power on only > > > one > > > > of them: in this case the single node does not quorate and the > > > cluster > > > > does not start: I have to power on both the node to have the > > > cluster > > > > (and services on the cluster) working. > > > > > > > > I'd like the cluster can work (and boot) even with a single node > > > (ie, if > > > > one of the node has hw failure and is down I still want to be able > > > to > > > > reboot the working node and have it booting correctly the cluster) > > > > > > > > any hints? (thank's for reading all this) > > > > > > > > -- > > > > bye, > > > > emilio > > > > > > > > -- > > > > Linux-cluster mailing list > > > > Linux-cluster@xxxxxxxxxx > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > > > Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20100927/e452edb5/attachment.html > > > > ------------------------------ > > Message: 4 > Date: Mon, 27 Sep 2010 18:31:31 +0100 > From: fosiul alam <expertalert@xxxxxxxxx> > To: linux clustering <linux-cluster@xxxxxxxxxx> > Subject: Re: ricci is very unstable in one nodes > Message-ID: > > <AANLkTikwtYxG3_gf0QxqJpGzZxowh4T7rGbwH-+MhWs8@xxxxxxxxxxxxxx<AANLkTikwtYxG3_gf0QxqJpGzZxowh4T7rGbwH-%2BMhWs8@xxxxxxxxxxxxxx> > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi > Thanks for your advise, > Currently i got this > > luci-0.12.2-12.el5.centos.1 > ricci-0.12.2-12.el5.centos.1 > > is this the same rpm as > > luci-0.12.2-12.el5_5.4.i386.rpm ? > ricci-0.12.2-12.el5_5.4.i386.rpm ? > > Thanks > > > On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote: > > > http://rhn.redhat.com/errata/RHBA-2010-0716.html > > > > It appears that this problem has been fixed in this errata. > > > > I installed the luci and ricci updates and did some lite testing. So > far, > > the timeout 11111 error has not shown up. > > > > Paul > > > > ----- Original Message ----- > > From: "fosiul alam" <expertalert@xxxxxxxxx> > > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > > Sent: Monday, September 27, 2010 10:48:27 AM > > Subject: Re: ricci is very unstable in one nodes > > > > Hi > > i am trying to patch ricci . let see how it goes > > > > but clusvcadm is failing as well > > > > [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local > > Member http1.xxxx.local trying to enable service:httpd1...Invalid > > operation for resource > > > > here, http1 , where i was trying to run the service from luci > > > > what could be the problem ? > > is there any way to find out if there is any problem with config ?? > > > > On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote: > > > > > > RHEL 5.6 hasn't been released yet so your package probably contains the > > problem. I'm not sure how in sync Centos is with RHEL or if they patch > > earlier so I cannot give you a time frame when it will be in Centos or > > if they have already patched it. The problem in that BZ is more of an > > annoyance, you usually just have to retry a time or two and it works. If > > you can't get Luci working properly with your service at all you should > > try enabling the service through the command line with clusvcadm -e. If > > it is not working from the command line either then there is a problem > > with the service config. > > > > > > > > > > -Ben > > > > > > > > > > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > > > > > Hi Ben > > > Thanks > > > > > > I named this cluster as mysql-server but i have not installed mysql > > > database in their yet > > > > > > and both luci and ricci on luci server and node1 is running this > > > version > > > > > > luci-0.12.2-12.el5.centos.1 > > > ricci-0.12.2-12.el5.centos.1 > > > > > > > > > do you think this version has problem as well ?? > > > > > > thanks for your help > > > > > > > > > > > > > > > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote: > > > > > > > > > There is an issue with ricci timeouts that was fixed recently: > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=564490 > > > > > > I'm not sure but you may be hitting that bug. Symptoms include: luci > > > isn't able to get the status from the node, timeouts when querying > > > ricci, etc. The fix should be released with 5.6 > > > > > > On the mysql service there are some options that you need to set. Here > > > are all the options available to that agent: > > > > > > mysql > > > Defines a MySQL database server > > > > > > Attribute Description > > > config_file Define configuration file > > > listen_address Define an IP address for MySQL server. If the address > > > is not given then first IP address from the service is taken. > > > mysqld_options Other command-line options for mysqld > > > name Name > > > ref Reference to existing mysql resource in the resources section. > > > service_name Inherit the service name. > > > shutdown_wait Wait X seconds for correct end of service shutdown > > > startup_wait Wait X seconds for correct end of service startup > > > __enforce_timeouts Consider a timeout for operations as fatal. > > > __failure_expire_time Amount of time before a failure is forgotten. > > > __independent_subtree Treat this and all children as an independent > > > subtree. __max_failures Maximum number of failures before returning a > > > failure to a status check. > > > > > > If I recall correctly you may need to tweak: > > > > > > shutdown_wait Wait X seconds for correct end of service shutdown > > > startup_wait Wait X seconds for correct end of service startup > > > > > > There can be problems relocating the DB if it takes too long to > > > start/shutdown. If you are having problems relocating with luci it may > > > be a good idea to test with: > > > > > > # clusvcadm -r <service name> -m <cluster node> > > > > > > -Ben > > > > > > > > > > > > > > > > > > > > > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > > > > > > > Hi > > > > I have 4 nodes cluster, > > > > It was running fine. but today one nodes is giving trouble > > > > > > > > From luci Gui interface, when i try to relocate service into this > > > node > > > > and trying to relocate from this nodes to another nodes > > > > > > > > from luci gui interface, its showing : > > > > > > > > Unable to retrieve batch 1908047789 status from > > > > beaver.domain.local:11111: clusvcadm start failed to start httpd1: > > > > Starting cluster service "httpd1" on node "http1.domain.local" -- > > > You > > > > will be redirected in 5 seconds. > > > > also > > > > > > > > The ricci agent for this node is unresponsive. Node-specific > > > > information is not available at this time. : > > > > > > > > but ricci is running on problematic node , > > > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101 > > > > > > > > there is not any firewall running. > > > > > > > > iptables -L > > > > Chain INPUT (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > Chain OUTPUT (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > Chain RH-Firewall-1-INPUT (0 references) > > > > target prot opt source destination > > > > > > > > port 11111 is runningg > > > > > > > > netstat -an | grep 11111 > > > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN > > > > > > > > > > > > but still ricci is very unstable , and i cant relocate any service > > > on > > > > this node or i cant relocate any service away from this node. > > > > > > > > from problematic node if i type this > > > > > > > > clustat > > > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010 > > > > Member Status: Quorate > > > > > > > > Member Name ID Status > > > > ------ ---- ---- ------ > > > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this > > > > server publicdns1.xxxx.local 2 Online, rgmanager > > > > http1.xxxx.local 3 Online, Local, rgmanager > > > > mail01.xxxxx.local 4 Online, rgmanager > > > > > > > > Service Name Owner (Last) State > > > > ------- ---- ----- ------ ----- > > > > service:httpd1 mail01.xxxx.local started > > > > service:mysql-server http1.xxxx.local started ------------------- > > > this > > > > is the problematic node > > > > service:public-dns publicdns1.xxxxxx.local started > > > > > > > > I cant move that service mysql-server from this node or cant > > > relocate > > > > any service on this node .. > > > > I am very confused. > > > > > > > > what shall i do to fix this issue ?? > > > > > > > > thanks for your advise. > > > > > > > > > > > > > > > > > > > > -- Linux-cluster mailing list > > > > Linux-cluster@xxxxxxxxxx > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > -- Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20100927/462f567b/attachment.html > > > > ------------------------------ > > Message: 5 > Date: Mon, 27 Sep 2010 18:37:44 +0100 > From: fosiul alam <expertalert@xxxxxxxxx> > To: linux clustering <linux-cluster@xxxxxxxxxx> > Subject: Re: ricci is very unstable in one nodes > Message-ID: > <AANLkTi=DfrVMFkp8No9UbwD+fVoRx9FmpO+qzY2RxLPk@xxxxxxxxxxxxxx<DfrVMFkp8No9UbwD%2BfVoRx9FmpO%2BqzY2RxLPk@xxxxxxxxxxxxxx> > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, Addition to my previous email have a look to this one > > from http1 ( where i am trying to relocate a service) > > [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local > Member http1.xxxx.local trying to enable service:httpd1...Success > Warning: service:httpd1 is now running on mail01.xxxx.local > > so, its saying its Success.. > but it actually no.. > > Thanks again > > > > On 27 September 2010 18:31, fosiul alam <expertalert@xxxxxxxxx> wrote: > > > Hi > > Thanks for your advise, > > Currently i got this > > > > > > luci-0.12.2-12.el5.centos.1 > > ricci-0.12.2-12.el5.centos.1 > > > > is this the same rpm as > > > > luci-0.12.2-12.el5_5.4.i386.rpm ? > > ricci-0.12.2-12.el5_5.4.i386.rpm ? > > > > Thanks > > > > > > > > On 27 September 2010 17:55, Paul M. Dyer <pmdyer@xxxxxxxxxxxxxxx> wrote: > > > >> http://rhn.redhat.com/errata/RHBA-2010-0716.html > >> > >> It appears that this problem has been fixed in this errata. > >> > >> I installed the luci and ricci updates and did some lite testing. So > >> far, the timeout 11111 error has not shown up. > >> > >> Paul > >> > >> ----- Original Message ----- > >> From: "fosiul alam" <expertalert@xxxxxxxxx> > >> To: "linux clustering" <linux-cluster@xxxxxxxxxx> > >> Sent: Monday, September 27, 2010 10:48:27 AM > >> Subject: Re: ricci is very unstable in one nodes > >> > >> Hi > >> i am trying to patch ricci . let see how it goes > >> > >> but clusvcadm is failing as well > >> > >> [root@http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local > >> Member http1.xxxx.local trying to enable service:httpd1...Invalid > >> operation for resource > >> > >> here, http1 , where i was trying to run the service from luci > >> > >> what could be the problem ? > >> is there any way to find out if there is any problem with config ?? > >> > >> On 27 September 2010 16:26, Ben Turner < bturner@xxxxxxxxxx > wrote: > >> > >> > >> RHEL 5.6 hasn't been released yet so your package probably contains the > >> problem. I'm not sure how in sync Centos is with RHEL or if they patch > >> earlier so I cannot give you a time frame when it will be in Centos or > >> if they have already patched it. The problem in that BZ is more of an > >> annoyance, you usually just have to retry a time or two and it works. If > >> you can't get Luci working properly with your service at all you should > >> try enabling the service through the command line with clusvcadm -e. If > >> it is not working from the command line either then there is a problem > >> with the service config. > >> > >> > >> > >> > >> -Ben > >> > >> > >> > >> > >> ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > >> > >> > Hi Ben > >> > Thanks > >> > > >> > I named this cluster as mysql-server but i have not installed mysql > >> > database in their yet > >> > > >> > and both luci and ricci on luci server and node1 is running this > >> > version > >> > > >> > luci-0.12.2-12.el5.centos.1 > >> > ricci-0.12.2-12.el5.centos.1 > >> > > >> > > >> > do you think this version has problem as well ?? > >> > > >> > thanks for your help > >> > > >> > > >> > > >> > > >> > On 24 September 2010 15:33, Ben Turner < bturner@xxxxxxxxxx > wrote: > >> > > >> > > >> > There is an issue with ricci timeouts that was fixed recently: > >> > > >> > https://bugzilla.redhat.com/show_bug.cgi?id=564490 > >> > > >> > I'm not sure but you may be hitting that bug. Symptoms include: luci > >> > isn't able to get the status from the node, timeouts when querying > >> > ricci, etc. The fix should be released with 5.6 > >> > > >> > On the mysql service there are some options that you need to set. Here > >> > are all the options available to that agent: > >> > > >> > mysql > >> > Defines a MySQL database server > >> > > >> > Attribute Description > >> > config_file Define configuration file > >> > listen_address Define an IP address for MySQL server. If the address > >> > is not given then first IP address from the service is taken. > >> > mysqld_options Other command-line options for mysqld > >> > name Name > >> > ref Reference to existing mysql resource in the resources section. > >> > service_name Inherit the service name. > >> > shutdown_wait Wait X seconds for correct end of service shutdown > >> > startup_wait Wait X seconds for correct end of service startup > >> > __enforce_timeouts Consider a timeout for operations as fatal. > >> > __failure_expire_time Amount of time before a failure is forgotten. > >> > __independent_subtree Treat this and all children as an independent > >> > subtree. __max_failures Maximum number of failures before returning a > >> > failure to a status check. > >> > > >> > If I recall correctly you may need to tweak: > >> > > >> > shutdown_wait Wait X seconds for correct end of service shutdown > >> > startup_wait Wait X seconds for correct end of service startup > >> > > >> > There can be problems relocating the DB if it takes too long to > >> > start/shutdown. If you are having problems relocating with luci it may > >> > be a good idea to test with: > >> > > >> > # clusvcadm -r <service name> -m <cluster node> > >> > > >> > -Ben > >> > > >> > > >> > > >> > > >> > > >> > > >> > ----- "fosiul alam" < expertalert@xxxxxxxxx > wrote: > >> > > >> > > Hi > >> > > I have 4 nodes cluster, > >> > > It was running fine. but today one nodes is giving trouble > >> > > > >> > > From luci Gui interface, when i try to relocate service into this > >> > node > >> > > and trying to relocate from this nodes to another nodes > >> > > > >> > > from luci gui interface, its showing : > >> > > > >> > > Unable to retrieve batch 1908047789 status from > >> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1: > >> > > Starting cluster service "httpd1" on node "http1.domain.local" -- > >> > You > >> > > will be redirected in 5 seconds. > >> > > also > >> > > > >> > > The ricci agent for this node is unresponsive. Node-specific > >> > > information is not available at this time. : > >> > > > >> > > but ricci is running on problematic node , > >> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101 > >> > > > >> > > there is not any firewall running. > >> > > > >> > > iptables -L > >> > > Chain INPUT (policy ACCEPT) > >> > > target prot opt source destination > >> > > > >> > > Chain FORWARD (policy ACCEPT) > >> > > target prot opt source destination > >> > > > >> > > Chain OUTPUT (policy ACCEPT) > >> > > target prot opt source destination > >> > > > >> > > Chain RH-Firewall-1-INPUT (0 references) > >> > > target prot opt source destination > >> > > > >> > > port 11111 is runningg > >> > > > >> > > netstat -an | grep 11111 > >> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN > >> > > > >> > > > >> > > but still ricci is very unstable , and i cant relocate any service > >> > on > >> > > this node or i cant relocate any service away from this node. > >> > > > >> > > from problematic node if i type this > >> > > > >> > > clustat > >> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010 > >> > > Member Status: Quorate > >> > > > >> > > Member Name ID Status > >> > > ------ ---- ---- ------ > >> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this > >> > > server publicdns1.xxxx.local 2 Online, rgmanager > >> > > http1.xxxx.local 3 Online, Local, rgmanager > >> > > mail01.xxxxx.local 4 Online, rgmanager > >> > > > >> > > Service Name Owner (Last) State > >> > > ------- ---- ----- ------ ----- > >> > > service:httpd1 mail01.xxxx.local started > >> > > service:mysql-server http1.xxxx.local started ------------------- > >> > this > >> > > is the problematic node > >> > > service:public-dns publicdns1.xxxxxx.local started > >> > > > >> > > I cant move that service mysql-server from this node or cant > >> > relocate > >> > > any service on this node .. > >> > > I am very confused. > >> > > > >> > > what shall i do to fix this issue ?? > >> > > > >> > > thanks for your advise. > >> > > > >> > > > >> > > > >> > > > >> > > -- Linux-cluster mailing list > >> > > Linux-cluster@xxxxxxxxxx > >> > > https://www.redhat.com/mailman/listinfo/linux-cluster > >> > > >> > -- Linux-cluster mailing list > >> > Linux-cluster@xxxxxxxxxx > >> > https://www.redhat.com/mailman/listinfo/linux-cluster > >> > > >> > > >> > -- Linux-cluster mailing list > >> > Linux-cluster@xxxxxxxxxx > >> > https://www.redhat.com/mailman/listinfo/linux-cluster > >> > >> -- Linux-cluster mailing list > >> Linux-cluster@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > >> > >> -- Linux-cluster mailing list > >> Linux-cluster@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > >> -- > >> Linux-cluster mailing list > >> Linux-cluster@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20100927/4101fdf9/attachment.html > > > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 77, Issue 23 > ********************************************* > -- Santosh -------------------------------------------------------------------------------- -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster