Problem Installing GFS using GNBD

"rajesh mishra" <raj4linux@xxxxxxxxx> · Tue, 18 Jul 2006 15:24:38 +0530

Hi All,
 I m new to Redhat GFS. I got the GSF code form http://sources.redhat.com/cluster site (from CVS
With tag - TRHEL4). I compiled and installed the GSF for source code. I followed the steps mentioned in

cluster/doc/min-gfs.txt file. I want to use GFS using GNBD server with 3 machines.
The cluster/doc/min-gfs.txt file looks like this:

Minimum GFS How To
-----------------
The following gfs configuration requires a minimum amount of hardware and
no expensive storage system.  It's the cheapest and quickest way to "play"
with gfs.

  --------------       --------------
  | GNBD  |       | GNBD   |
  | client    |      | client     |       <-- these nodes use gfs
  | node2   |      | node3    |
  -------------       -------------

      |                |
      ------------------  IP network
               |
          --------------
          | GNBD   |
          | server    |                <-- this node doesn't use gfs
          | node1    |

          ---------------
- There are three machines to use with hostnames: node1, node2, node3
- node1 has an extra disk /dev/sda1 to use for gfs
  (this could be hda1 or an lvm LV or an md device)
- node1 will use gnbd to export this disk to node2 and node3
- Node1 cannot use gfs, it only acts as a gnbd server.
  (Node1 will /not/ actually be part of the cluster since it is only
   running the gnbd server.)
- Only node2 and node3 will be in the cluster and use gfs.
  (A two-node cluster is a special case for cman, noted in the config below.)
- There's not much point to using clvm in this setup so it's left out.
- Download the "cluster" source tree.
- Build and install from the cluster source tree.  (The kernel components
  are not required on node1 which will only need the gnbd_serv program.)
    cd cluster
    ./configure --kernel_src=/path/to/kernel
    make; make install
- Create /etc/cluster/cluster.conf on node2 with the following contents:
<?xml version="1.0"?>
<cluster name="gamma" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="node2">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="node2"/>
  </method>

 </fence>
</clusternode>
<clusternode name="node3">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="node3"/>
  </method>
 </fence>

</clusternode>
</clusternodes>
<fencedevices>
 <fencedevice name="gnbd" agent="fence_gnbd" servers="node1"/>
</fencedevices>
</cluster>

- load kernel modules on nodes
node2 and node3> modprobe gnbd
node2 and node3> modprobe gfs
node2 and node3> modprobe lock_dlm
- run the following commands
node1> gnbd_serv -n
node1> gnbd_export -c -d /dev/sda1 -e global_disk
node2 and node3> gnbd_import -i node1
node2 and node3> ccsd
node2 and node3> cman_tool join
node2 and node3> fence_tool join
node2> gfs_mkfs -p lock_dlm -t gamma:gfs1 -j 2 /dev/gnbd/global_disk
node2 and node3> mount -t gfs /dev/gnbd/global_disk /mnt
- the end, you now have a gfs file system mounted on node2 and node3

Appendix A
----------
To use manual fencing instead of gnbd fencing, the cluster.conf file
would look like this:
<?xml version="1.0"?>
<cluster name="gamma" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="node2">
 <fence>
  <method name="single">
   <device name="manual" ipaddr="node2"/>
  </method>

 </fence>
</clusternode>
<clusternode name="node3">
 <fence>
  <method name="single">
   <device name="manual" ipaddr="node3"/>
  </method>
 </fence>

</clusternode>
</clusternodes>
<fencedevices>
 <fencedevice name="manual" agent="fence_manual"/>
</fencedevices>
</cluster>

FAQ
---
- Why can't node3 use gfs, too?
You might be able to make it work, but we recommend that you not try.
This software was not intended or designed to allow that kind of usage.
- Isn't node3 a single point of failure? how do I avoid that?
Yes it is.  For the time being, there's no way to avoid that, apart from
not using gnbd, of course.  Eventually, there will be a way to avoid this
using cluster mirroring.
- More info from
  http://sources.redhat.com/cluster/gnbd/gnbd_usage.txt
  http://sources.redhat.com/cluster/doc/usage.txt

Following commands have been executed on node-1:
[root@localhost ~]# gnbd_serv -n
gnbd_serv: startup succeeded
[root@localhost ~]# gnbd_export -c -d  /dev/sda5 -e global_disk
gnbd_export: created GNBD global_disk serving file /dev/sda5

[root@localhost ~]# gnbd_export -v
Server[1] : global_disk
--------------------------
      file : /dev/sda5
   sectors : 24820362
  readonly : no
    cached : yes
   timeout : no
       uid :

[root@localhost ~]# ps ax| grep gnbd
12571 ?        S      0:00 gnbd_serv -n
12607 ?        S      0:00 gnbd_serv -n
12609 pts/3    S+     0:00 grep gnbd
[root@localhost ~]#

But I m getting following messages in /var/log/messages from node-1 (GNBD server machine):
Jul 18 14:34:06 localhost gnbd_serv[12571]: startup succeeded 
Jul 18 14:37:35 localhost gnbd_serv[12571]: server process 12596 exited because of signal 15 

Jul 18 14:37:40 localhost gnbd_serv[12571]: server process 12597 exited because of signal 15 
Jul 18 14:37:45 localhost gnbd_serv[12571]: server process 12598 exited because of signal 15 
Jul 18 14:37:50 localhost gnbd_serv[12571]: server process 12599 exited because of signal 15 

Jul 18 14:37:55 localhost gnbd_serv[12571]: server process 12600 exited because of signal 15 
Jul 18 14:38:00 localhost gnbd_serv[12571]: server process 12601 exited because of signal 15 
Jul 18 14:38:05 localhost gnbd_serv[12571]: server process 12602 exited because of signal 15 

Jul 18 14:38:10 localhost gnbd_serv[12571]: server process 12603 exited because of signal 15 
Jul 18 14:38:15 localhost gnbd_serv[12571]: server process 12604 exited because of signal 15 
Jul 18 14:38:20 localhost gnbd_serv[12571]: server process 12605 exited because of signal 15 

Jul 18 14:38:25 localhost gnbd_serv[12571]: server process 12606 exited because of signal 15 

Following commands have been executed on node-2 and node-3:
[root@localhost ~]# modprobe gnbd
[root@localhost ~]# modprobe gfs
[root@localhost ~]# modprobe lock_dlm
[root@localhost
 ~]# gnbd_import -n -i 172.16.222.63
gnbd_import: created directory /dev/gnbd
gnbd_import: created gnbd device global_disk
gnbd_recvd: gnbd_recvd started
[root@localhost ~]# ccsd

And following messages in /var/log/messages from node-2 and node-3 (GNBD client mchines):
Jul 18 09:09:19 localhost kernel: gnbd: registered device at major 252
Jul 18 09:09:21 localhost hald[2759]: Timed out waiting for hotplug event 318. Rebasing to 574

Jul 18 09:10:41 localhost kernel: CMAN <CVS> (built Jul 17 2006 09:01:33) installed
Jul 18 09:10:41 localhost kernel: NET: Registered protocol family 30
Jul 18 09:10:41 localhost kernel: Lock_Harness <CVS> (built Jul 17 2006 09:01:49) installed

Jul 18 09:10:41 localhost kernel: gfs: no version for "kcl_get_node_by_nodeid" found: kernel tainted.
Jul 18 09:10:41 localhost kernel: GFS <CVS> (built Jul 17 2006 09:02:14) installed
Jul 18 09:10:57 localhost kernel: DLM <CVS> (built Jul 17 2006 09:01:45) installed

Jul 18 09:10:57 localhost kernel: Lock_DLM (built Jul 17 2006 09:01:53) installed
Jul 18 09:15:03 localhost gnbd_recvd[6334]: gnbd_recvd started 
Jul 18 09:15:03 localhost kernel: resending requests
Jul 18 09:15:41 localhost gnbd_recvd[6334]: client lost connection with 
172.16.222.63 : Broken pipe 
Jul 18 09:15:41 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:15:41 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:15:41 localhost kernel: gnbd0: shutting down socket

Jul 18 09:15:41 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:15:46 localhost kernel: resending requests
Jul 18 09:15:51 localhost gnbd_recvd[6334]: client lost connection with 
172.16.222.63 : Broken pipe 
Jul 18 09:15:51 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:15:51 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:15:51 localhost kernel: gnbd0: shutting down socket

Jul 18 09:15:51 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:15:56 localhost kernel: resending requests
Jul 18 09:15:58 localhost ccsd[6336]: Starting ccsd DEVEL.1153141288: 
Jul 18 09:15:58 localhost ccsd[6336]:  Built: Jul 17 2006 09:02:27 

Jul 18 09:15:58 localhost ccsd[6336]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved. 
Jul 18 09:16:01 localhost gnbd_recvd[6334]: client lost connection with 172.16.222.63
 : Broken pipe 
Jul 18 09:16:01 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:16:01 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:01 localhost kernel: gnbd0: shutting down socket

Jul 18 09:16:01 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:06 localhost kernel: resending requests
Jul 18 09:16:11 localhost gnbd_recvd[6334]: client lost connection with 
172.16.222.63 : Broken pipe 
Jul 18 09:16:11 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:16:11 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:11 localhost kernel: gnbd0: shutting down socket

Jul 18 09:16:11 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:16 localhost kernel: resending requests
Jul 18 09:16:21 localhost gnbd_recvd[6334]: client lost connection with 
172.16.222.63 : Broken pipe 
Jul 18 09:16:21 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:16:21 localhost kernel: gnbd0: Receive control failed (result -32)
Jul 18 09:16:21 localhost kernel: gnbd0: shutting down socket

Jul 18 09:16:21 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:26 localhost kernel: resending requests
Jul 18 09:16:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 30 seconds. 

Jul 18 09:16:31 localhost gnbd_recvd[6334]: client lost connection with 172.16.222.63 : Broken pipe 
Jul 18 09:16:31 localhost gnbd_recvd[6334]: reconnecting 
Jul 18 09:16:31 localhost kernel: gnbd0: Receive control failed (result -32)

Jul 18 09:16:31 localhost kernel: gnbd0: shutting down socket
Jul 18 09:16:31 localhost kernel: exitting GNBD_DO_IT ioctl
Jul 18 09:16:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 60 seconds. 

Jul 18 09:17:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 90 seconds. 
Jul 18 09:17:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 120 seconds. 
Jul 18 09:18:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 150 seconds. 

Jul 18 09:18:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 180 seconds. 
Jul 18 09:19:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 210 seconds. 
Jul 18 09:19:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 240 seconds. 

Jul 18 09:20:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 270 seconds. 
Jul 18 09:20:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 300 seconds. 
Jul 18 09:21:27 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 330 seconds. 

Jul 18 09:21:57 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 360 seconds. 
Jul 18 09:22:28 localhost ccsd[6336]: Unable to connect to cluster infrastructure after 390 seconds. 

My /etc/cluster/cluster.conf file looks like:

<?xml version="1.0"?>
<cluster name="gamma" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="172.16.222.128">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="
172.16.222.128"/>
  </method>
 </fence>
</clusternode>
<clusternode name="172.16.222.62">
 <fence>
  <method name="single">
   <device name="gnbd" ipaddr="
172.16.222.62"/>
  </method>
 </fence>
</clusternode>
</clusternodes>
<fencedevices>
 <fencedevice name="gnbd" agent="fence_gnbd" servers="172.16.222.63"/>
</fencedevices>
</cluster>

 If i m using GNBD to export a disk partition from node-1 (GNBD server) and importing that partition using GNBD_IMPORT command from node-2 and node-3,
then I can create the file system on that exported device.

But in above case I m failing.
And finally I m not able to use GFS. If any body has any idea please help me...

With Regards
Rajesh.

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster