Hi,
We have 2 servers HP Proliant 380 G3 (RedHat Advanced Server 3) attached
by fiber optic to the storage area network SAN HP MSA1000 and we want to
install and configure The RedHat Cluster Suite.
I setuped and configured a clustered NFS on the 2 servers RAC1 and RACGFS.
clumanager-1.2.26.1-1
redhat-config-cluster-1.0.7-1
I have created 2 quorum partitions /dev/sdd2 and /dev/sdd3 (100MB each).
I created another huge partition /dev/sdd4 (over 600GB) and formatted it in
ext3 file system.
I installed the cluster suite on the 1st node (RAC1) and 2nd node RACGFS and
I started the rawdevices on the two nodes RAC1 and RACGFS (it's OK).
This the hosts file /etc/host on the node1 (RAC1) and node2 RACGFS
Do not remove the following line, or various programs
# that require network functionality will fail.
#127.0.0.1 rac1 localhost.localdomain localhost
127.0.0.1 localhost.localdomain localhost
#
# Private hostnames
#
192.168.253.3 rac1.project.net rac1
192.168.253.4 rac2.project.net rac2
192.168.253.10 racgfs.project.net racgfs
192.168.253.20 raclu_nfs.project.net raclu_nfs
#
# Hostnames used for Interconnect
#
1.1.1.1 rac1i.project.net rac1i
1.1.1.2 rac2i.project.net rac2i
1.1.1.3 racgfsi.project.net racgfsi
#
192.168.253.5 infra.project.net infra
192.168.253.7 ractest.project.net ractest
#
I generated a /etc/cluster.xml on the 1st node RAC1 and the 2nd node RACGFS.
<?xml version="1.0"?>
<cluconfig version="3.0">
<clumembd broadcast="no" interval="750000" loglevel="5" multicast="yes"
multicast_ipaddress="225.0.0.11" thread="yes" tko_count="20"/>
<cluquorumd loglevel="5" pinginterval="1" tiebreaker_ip=""/>
<clurmtabd loglevel="5" pollinterval="4"/>
<clusvcmgrd loglevel="5" use_netlink="yes"/>
<clulockd loglevel="5"/>
<cluster config_viewnumber="24" key="978dcd78e05c5961cf1aaaa03b41209b"
name="cisn"/>
<sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
<members>
<member id="0" name="192.168.253.3" watchdog="no"/>
<member id="1" name="192.168.253.10" watchdog="no"/>
</members>
<services>
<service checkinterval="5" failoverdomain="cisncluster" id="0"
maxfalsestarts="0" maxrestarts="0" name="nfs_cisn" userscript="None">
<service_ipaddresses>
<service_ipaddress broadcast="None" id="0"
ipaddress="192.168.253.20" monitor_link="0" netmask="255.255.255.0"/>
</service_ipaddresses>
<device id="0" name="/dev/sdd4">
<mount forceunmount="yes" mountpoint="/u04"/>
<nfsexport id="0" name="/u04">
<client id="0" name="*" options="rw"/>
</nfsexport>
</device>
</service>
</services>
<failoverdomains>
<failoverdomain id="0" name="cisncluster" ordered="yes" restricted="no">
<failoverdomainnode id="0" name="192.168.253.3"/>
<failoverdomainnode id="1" name="192.168.253.10"/>
</failoverdomain>
</failoverdomains>
</cluconfig>
I created a NFS share on /u04 (mount on /dev/sdd4) using the Cluster GUI
manager on RAC1.
I launched on the 2 nodes Rac1 and RACgfs the following command:
service clumanager start
I checked the result on the 2 nodes, on RAC1:
clustat results :
Cluster Status - project
09:04:34
Cluster Quorum Incarnation #1
Shared State: Shared Raw Device Driver v1.2
Member Status
------------------ ----------
192.168.253.3 Active <-- You are here
192.168.253.10 Active
Service Status Owner (Last) Last Transition Chk Restarts
-------------- -------- ---------------- --------------- --- --------
nfs_cisn started 192.168.253.3 09:07:59 Sep 21 5 0
on RacGfs: clustat results :
Cluster Status - cisn
09:07:39
Cluster Quorum Incarnation #3
Shared State: Shared Raw Device Driver v1.2
Member Status
------------------ ----------
192.168.253.3 Active
192.168.253.10 Active <-- You are here
Service Status Owner (Last) Last Transition Chk Restarts
-------------- -------- ---------------- --------------- --- --------
nfs_cisn started 192.168.253.3 09:07:59 Sep 21 5 0
When I launched ifconfig on RAC1, we saw that the service IP address
192.168.253.20 is generated on eth2:0.
And I launched on other servers the following command:
mount ?t nfs 192.168.253.20:/u04 /u04
And all are OK, I can list the /u04 content from any server.
But my only problem is:
When I want to try a test if the clustered NFS will work fine, I rebooted
RAC1 frequently and RACGFS continue to work as the failover server and when
I launched ifconfig on RACGFS, we saw that the service IP address
192.168.253.20 is generated on eth0:0 .
We can list /u04 content (clustered NFS mount) on the other servers after
few seconds of RAC1 rebooting:
But after many reboots, I expect a big problem, the both cluster node
servers cannot obtain the service IP address 192.168.253.20 when I launch
ifconfig on the both nodes.
On Rac1:
eth0 Link encap:Ethernet HWaddr 00:0B:CD:EF:2B:C1
inet addr:1.1.1.1 Bcast:1.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:89170 errors:0 dropped:0 overruns:0 frame:0
TX packets:87405 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17288193 (16.4 Mb) TX bytes:14452757 (13.7 Mb)
Interrupt:15
eth2 Link encap:Ethernet HWaddr 00:0B:CD:FF:44:02
inet addr:192.168.253.3 Bcast:192.168.253.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1349991 errors:0 dropped:0 overruns:0 frame:0
TX packets:435450 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1592635536 (1518.8 Mb) TX bytes:162026101 (154.5 Mb)
Interrupt:7
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1001181 errors:0 dropped:0 overruns:0 frame:0
TX packets:1001181 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:76097441 (72.5 Mb) TX bytes:76097441 (72.5 Mb)
On RACGFS:
eth0 Link encap:Ethernet HWaddr 00:14:38:50:D3:E4
inet addr:192.168.253.10 Bcast:192.168.253.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:211223 errors:0 dropped:0 overruns:0 frame:0
TX packets:160026 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:14917480 (14.2 Mb) TX bytes:13886063 (13.2 Mb)
Interrupt:25
eth1 Link encap:Ethernet HWaddr 00:14:38:50:D3:E3
inet addr:1.1.1.3 Bcast:1.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:256 (256.0 b)
Interrupt:26
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:184529 errors:0 dropped:0 overruns:0 frame:0
TX packets:184529 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10971489 (10.4 Mb) TX bytes:10971489 (10.4 Mb)
I tried many commands, I stopped the cluster services On both nodes and
restart it but unfortunately it doesn?t work and we cannot obtain the
clustered NFS mount.
Have you any idea to fix this problem?
Thanks for your replies and help
Abbes Bettahar
514-296-0756
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster