Re: Linux-cluster Digest, Vol 117, Issue 6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
According to the clustat status  it seems there is some issue with the multi casting. Plz check the multicast using omping

On Jan 17, 2014 8:04 PM, <linux-cluster-request@xxxxxxxxxx> wrote:
Send Linux-cluster mailing list submissions to
        linux-cluster@xxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
        https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
        linux-cluster-request@xxxxxxxxxx

You can reach the person managing the list at
        linux-cluster-owner@xxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. 2 node cluster questions (Benjamin Budts)
   2. Re: 2 node cluster questions (emmanuel segura)
   3. Is it gfs2 able to perform on a 24TB partition?
      (Juan Pablo Lorier)
   4. Re: Is it gfs2 able to perform on a 24TB partition?
      (Steven Whitehouse)
   5. Re: 2 node cluster questions (Benjamin Budts)
   6. Re: 2 node cluster questions (emmanuel segura)


----------------------------------------------------------------------

Message: 1
Date: Thu, 16 Jan 2014 18:04:18 +0100
From: "Benjamin Budts" <ben@xxxxxxxxxx>
To: <linux-cluster@xxxxxxxxxx>
Subject: 2 node cluster questions
Message-ID: <002801cf12dc$fe8d1e40$fba75ac0$@zentrix.be>
Content-Type: text/plain; charset="us-ascii"



Hey All,





About my setup :



I created a cluster in luci with shared storage, added 2 nodes (reachable),
added 2 fence devices (idrac).



Now, I can't seem to add my fence devices to my nodes. If I click on a node
it times out in the gui.



So I checked my node logs and found in /var/log/messages ricci hangs on
/etc/init.d/clvmd status



When I run the command manually I get the PID and it hangs.





I also seem to have a split brain config :



# clustat on node 1 shows : node 1 Online,Local

                                                    Node 2 Offline



# clustat on node 2 shows : node 1 offline

                                                    Node 2 online, Local





My next step was testing if my multicast was working correctly (I suspect it
isn't). Would you guys have any recommendations besides the following redhat
multicast test link ? :



Https://access.redhat.com/site/articles/22304







Some info about my systems  :

----------------------------------



.         Redhat 6.5 on 2 nodes with resilient storage & HA addon licenses

.         Mgmt. station redhat 6.5 with Luci







Thx a lot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20140116/1e1c660e/attachment.html>

------------------------------

Message: 2
Date: Thu, 16 Jan 2014 18:17:11 +0100
From: emmanuel segura <emi2fast@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject: Re: 2 node cluster questions
Message-ID:
        <CAE7pJ3DGsXJx602Tr73UjFFrrYpFi686TJeKQFqybxMv+5D00A@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"

If you think your problem is the multicast, try to use unicast
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-unicast-traffic-CA.html,
if you are using redhat 6.X


2014/1/16 Benjamin Budts <ben@xxxxxxxxxx>

>
>
> Hey All,
>
>
>
>
>
> About my setup :
>
>
>
> I created a cluster in luci with shared storage, added 2 nodes
> (reachable), added 2 fence devices (idrac).
>
>
>
> Now, I can?t seem to add my fence devices to my nodes. If I click on a
> node it times out in the gui.
>
>
>
> So I checked my node logs and found in /var/log/messages ricci hangs on
> /etc/init.d/clvmd status
>
>
>
> When I run the command manually I get the PID and it hangs?
>
>
>
>
>
> I also seem to have a split brain config :
>
>
>
> # clustat on node 1 shows : node 1 Online,Local
>
>                                                     Node 2 Offline
>
>
>
> # clustat on node 2 shows : node 1 offline
>
>                                                     Node 2 online, Local
>
>
>
>
>
> My next step was testing if my multicast was working correctly (I suspect
> it isn?t). Would you guys have any recommendations besides the following
> redhat multicast test link ? :
>
>
>
> *Https://access.redhat.com/site/articles/22304
> <Https://access.redhat.com/site/articles/22304>*
>
>
>
>
>
>
>
> Some info about my systems  :
>
> ----------------------------------
>
>
>
> ?         Redhat 6.5 on 2 nodes with resilient storage & HA addon licenses
>
> ?         Mgmt. station redhat 6.5 with Luci
>
>
>
>
>
>
>
> Thx a lot
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20140116/b2678815/attachment.html>

------------------------------

Message: 3
Date: Fri, 17 Jan 2014 09:10:42 -0200
From: Juan Pablo Lorier <jplorier@xxxxxxxxx>
To: linux-cluster@xxxxxxxxxx
Subject: Is it gfs2 able to perform on a 24TB
        partition?
Message-ID: <52D90FB2.4060906@xxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I've been using gfs2 on top of a 24 TB lvm volume used as a file server
for several month now and I see a lot of io related to glock_workqueue
when there's file transfers. The threads even get to be the top ones in
read ops in iotop.
Is that normal? how can I debugg this?
Regards,



------------------------------

Message: 4
Date: Fri, 17 Jan 2014 11:16:28 +0000
From: Steven Whitehouse <swhiteho@xxxxxxxxxx>
To: Juan Pablo Lorier <jplorier@xxxxxxxxx>
Cc: linux-cluster@xxxxxxxxxx
Subject: Re: Is it gfs2 able to perform on a 24TB
        partition?
Message-ID: <1389957388.2734.20.camel@menhir>
Content-Type: text/plain; charset="UTF-8"

Hi,

On Fri, 2014-01-17 at 09:10 -0200, Juan Pablo Lorier wrote:
> Hi,
>
> I've been using gfs2 on top of a 24 TB lvm volume used as a file server
> for several month now and I see a lot of io related to glock_workqueue
> when there's file transfers. The threads even get to be the top ones in
> read ops in iotop.
> Is that normal? how can I debugg this?
> Regards,
>

Well the glock_workqueue threads are running the internal state machine
that controls the caching of data. So depending on the workload in
question, yes it is expected that you'll see a fair amount of activity
there.

If the issue is that you are seeing the cache being used inefficiently,
then it maybe possible to improve that by looking at the i/o pattern
generated by the application. There are also other things which can make
a difference, such as setting noatime on the mount.

You don't mention what version of the kernel you are using, but more
recent kernels have better tools (such as tracepoints) which can assist
in debugging this kind of issue,

Steve.




------------------------------

Message: 5
Date: Fri, 17 Jan 2014 15:01:22 +0100
From: "Benjamin Budts" <ben@xxxxxxxxxx>
To: <linux-cluster@xxxxxxxxxx>
Subject: Re: 2 node cluster questions
Message-ID: <005e01cf138c$9a981100$cfc83300$@zentrix.be>
Content-Type: text/plain; charset="us-ascii"



Found my problem



When running cman_tool status I saw that each node  only saw 1 node.

I have 2 pairs of network on each machine (heartbeat network 1gbit multicast
enabled/application network 10Gbit) with their own host mapping in
/etc/hosts



I used the the non-heartbeat hostname to add nodes into cluster via luci
which caused issues.



Removed luci & /var/lib/luci on nodes, as well as cat /dev/null
/etc/cluster/cluster.conf

Reinstalled luci, created cluster, added 2 nodes with correct hostnames,
problem solved.



Could anyone point me to a good guide for using cluster lvm ?



Thx







From: Elvir Kuric [mailto:ekuric@xxxxxxxxxx]
Sent: vrijdag 17 januari 2014 10:42
To: ben@xxxxxxxxxx
Subject: Re: 2 node cluster questions



On 01/16/2014 06:04 PM, Benjamin Budts wrote:



Hey All,





About my setup :



I created a cluster in luci with shared storage, added 2 nodes (reachable),
added 2 fence devices (idrac).



Now, I can't seem to add my fence devices to my nodes. If I click on a node
it times out in the gui.



So I checked my node logs and found in /var/log/messages ricci hangs on
/etc/init.d/clvmd status



When I run the command manually I get the PID and it hangs.





I also seem to have a split brain config :



# clustat on node 1 shows : node 1 Online,Local

                                                    Node 2 Offline



# clustat on node 2 shows : node 1 offline

                                                    Node 2 online, Local





My next step was testing if my multicast was working correctly (I suspect it
isn't). Would you guys have any recommendations besides the following redhat
multicast test link ? :



Https://access.redhat.com/site/articles/22304







Some info about my systems  :

----------------------------------



.         Redhat 6.5 on 2 nodes with resilient storage & HA addon licenses

.         Mgmt. station redhat 6.5 with Luci







Thx a lot





Hard to say what issue is based on this without checking fully logs, can you
please open case via Red Hat customer portal https://access.redhat.com ( I
assume you have valid subscription ) so we can take deeper look into this
problem.

Thank you in advance,

Kind regards,





--
Elvir Kuric,Sr. TSE / Red Hat / GSS EMEA /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20140117/6d673d84/attachment.html>

------------------------------

Message: 6
Date: Fri, 17 Jan 2014 15:27:42 +0100
From: emmanuel segura <emi2fast@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject: Re: 2 node cluster questions
Message-ID:
        <CAE7pJ3Bi7woB42pD_dCS0xBrAs8uMgGCZCmfc9GyBzk3bwNr-A@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"

https://access.redhat.com/site/documentation/it-IT/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html


2014/1/17 Benjamin Budts <ben@xxxxxxxxxx>

>
>
> Found my problem
>
>
>
> When running cman_tool status I saw that each node  only saw 1 node.
>
> I have 2 pairs of network on each machine (heartbeat network 1gbit
> multicast enabled/application network 10Gbit) with their own host mapping
> in /etc/hosts
>
>
>
> I used the the non-heartbeat hostname to add nodes into cluster via luci
> which caused issues.
>
>
>
> Removed luci & /var/lib/luci on nodes, as well as cat /dev/null
> /etc/cluster/cluster.conf
>
> Reinstalled luci, created cluster, added 2 nodes with correct hostnames,
> problem solved.
>
>
>
> Could anyone point me to a good guide for using cluster lvm ?
>
>
>
> Thx
>
>
>
>
>
>
>
> *From:* Elvir Kuric [mailto:ekuric@xxxxxxxxxx]
> *Sent:* vrijdag 17 januari 2014 10:42
> *To:* ben@xxxxxxxxxx
> *Subject:* Re: 2 node cluster questions
>
>
>
> On 01/16/2014 06:04 PM, Benjamin Budts wrote:
>
>
>
> Hey All,
>
>
>
>
>
> About my setup :
>
>
>
> I created a cluster in luci with shared storage, added 2 nodes
> (reachable), added 2 fence devices (idrac).
>
>
>
> Now, I can?t seem to add my fence devices to my nodes. If I click on a
> node it times out in the gui.
>
>
>
> So I checked my node logs and found in /var/log/messages ricci hangs on
> /etc/init.d/clvmd status
>
>
>
> When I run the command manually I get the PID and it hangs?
>
>
>
>
>
> I also seem to have a split brain config :
>
>
>
> # clustat on node 1 shows : node 1 Online,Local
>
>                                                     Node 2 Offline
>
>
>
> # clustat on node 2 shows : node 1 offline
>
>                                                     Node 2 online, Local
>
>
>
>
>
> My next step was testing if my multicast was working correctly (I suspect
> it isn?t). Would you guys have any recommendations besides the following
> redhat multicast test link ? :
>
>
>
> *Https://access.redhat.com/site/articles/22304
> <Https://access.redhat.com/site/articles/22304>*
>
>
>
>
>
>
>
> Some info about my systems  :
>
> ----------------------------------
>
>
>
> ?         Redhat 6.5 on 2 nodes with resilient storage & HA addon licenses
>
> ?         Mgmt. station redhat 6.5 with Luci
>
>
>
>
>
>
>
> Thx a lot
>
>
>
> Hard to say what issue is based on this without checking fully logs, can
> you please open case via Red Hat customer portal https://access.redhat.com( I assume you have valid subscription ) so we can take deeper look into
> this problem.
>
> Thank you in advance,
>
> Kind regards,
>
>
>
> --
>
> Elvir Kuric,Sr. TSE / Red Hat / GSS EMEA /
>
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20140117/6fc1d7e7/attachment.html>

------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 117, Issue 6
*********************************************
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux