Re: gnbd_export stops working after reboot

Thai Duong <thaidn@xxxxxxxxx> · Sun, 27 Nov 2005 16:42:49 +0700

Never mind, I have fixed all the problems myself. The key requirements are:

- You must use caching GNBD when exporting the cluster_cca. This is
because at the time you create cluster_cca, there's still no cluster
falicity, no lock_gulm so far, and GNBD exporting only works with "-c".

- You must NOT use caching GNBD when exporting devices for GFS.

I think these requirements should be put into GFS Administrator Guide.

Regards,

--Thai Duong.

On 11/27/05, Thai Duong <thaidn@xxxxxxxxx> wrote:
Hi list,

I intend to setup a Oracle9i RAC cluster using GFS 6.0 as the CFS.
Because the SAN is not available atm so I decide to use GNBD instead. I
have three IA64 servers running RHAS 3 update 6 called node1, node2 and
node3. Node1 and node2 are GNBD clients and GFS nodes. Node3 is the
GNBD server. I also use all of them as lock servers.

I followed the GFS 6.0 Administrator guide and encountered no problem
until I tried to mount the GFS file system on node2. It took forever to
run "mount -t gfs /dev/pool/pool0 /gfs -o acl". I killed the mount
process and tried again on node1. This time it returned something like
the error when you try to mount a unknown file system. I rmmod the gfs
module and modprobe it again but still no luck. I checked against the
startup procedure and found that although I had started lock_gulmd on
all nodes but only node3 had a running instance. There was no sight of
lock_gulmd on node1 and node2. I tried to start lock_gulmd again and
after a few times, it got running just on node2 but mounting gfs still
didnt work.

I didnt know what to do next so I decided to start over again. After
chkconfig off and GFS related daemons, I restarted the servers (a bad
habit from the Windows time :( ). After all the servers are up again, I
got "gnbd_export error: create request failed : Connection refused"
error when executing the following commands on node3 (in order to
export device as GNBD):

# modprobe gndb_serv

# lsmod

[root@db-svr-test-03 root]# lsmod

Module                 
Size  Used by    Not tainted

gnbd_serv              74288   0  (unused)

lock_gulm             149872   0  [gnbd_serv]

lock_harness            7288   0  [lock_gulm]

....

# gnbd_export -d /dev/cciss/c0d0p4 -e cluster.cca

gnbd_export error: create request failed : Connection refused

As you can see below, gnbd_serv was running and listening on the default port, 14243:

# netstat -nat 

[root@db-svr-test-03 root]# netstat -nat

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local
Address              
Foreign
Address            
State

tcp       
0      0
0.0.0.0:14243              
0.0.0.0:*                  
LISTEN

tcp       
0      0
0.0.0.0:22                 
0.0.0.0:*                  
LISTEN

.....

I also placed a tcpdump -vv -i lo port 14243 on node3 and saw that
there were some traffic when I re-executed "gnbd_export -d
/dev/cciss/c0d0p4 -e cluster.cca". it even passed the threeway
handshark procedure but while the client side was pushing data the
server suddenly sent a F packet.

I even removed the GFS and GFS-modules RPM and reinstalled them but
still no luck. What am I supposed to do now? Any help appreciated.

Regards,

--Thai Duong.

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster