On Wed, Nov 09, 2005 at 10:17:49AM -0700, Ryan Thomson wrote: > Hi list, > > I'm having some issues setting up a GFS mount w/ NFS export on RHEL4 using > the latest cluster suite packages from RHN. I'm using GFS CVS (RHEL4) and > LVM2 (clvmd) from source tarball (2.2.01.09) if that makes any difference. > > The problem I am having is this: I setup a service with a GFS resource, an > NFS export resource and an NFS client resource. The service starts fine > and I can mount the NFS export over the network from clients. After one > minute and each minute after that I'm seeing some errors in my logs and > the service is restarted. I looked and clusterfs.sh and saw that it's > supposed to be doing a "isMounted" check every minute... but how is that > failing if I can access everything just fine, locally and over NFS? The status check is failing, see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172066 for an explanation and a mini-patch. > Here is the error as I am seeing it in /var/log/messages: > > Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> status on clusterfs > "people" returned 1 (generic error) > Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Stopping service NFS > people > Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing IPv4 address > 136.159.***.*** from eth0 > Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> Removing export: > 136.159.***.0/24:/people > Nov 9 10:00:59 wolverine clurgmgrd: [6901]: <info> unmounting > /dev/mapper/BIOCOMP-people (/people) > Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Service NFS people is > recovering > Nov 9 10:00:59 wolverine clurgmgrd[6901]: <notice> Recovering failed > service NFS people > Nov 9 10:01:00 wolverine kernel: GFS: Trying to join cluster > "lock_nolock", "" > Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: Joined cluster. Now > mounting FS... > Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Trying to > acquire journal lock... > Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Looking at > journal... > Nov 9 10:01:00 wolverine kernel: GFS: fsid=dm-1.0: jid=0: Done > Nov 9 10:01:00 wolverine clurmtabd[27592]: <err> #20: Failed set log level > Nov 9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding export: > 136.159.***.0/24:/people (rw,sync) > Nov 9 10:01:00 wolverine clurgmgrd: [6901]: <info> Adding IPv4 address > 136.159.***.*** to eth0 > Nov 9 10:01:01 wolverine clurgmgrd[6901]: <notice> Service NFS people > started > > > And here is my cluster.conf file: > > <?xml version="1.0"?> > <cluster config_version="28" name="biocomp_cluster"> > <fence_daemon clean_start="0" post_fail_delay="0" > post_join_delay="3"/> > <clusternodes> > <clusternode name="wolverine" votes="1"> > <fence> > <method name="1"> > <device name="apcfence" port="1" > switch="0"/> > </method> > </fence> > </clusternode> > <clusternode name="skunk" votes="1"> > <fence> > <method name="1"> > <device name="apcfence" port="2" > switch="0"/> > </method> > </fence> > </clusternode> > <clusternode name="cottontail" votes="1"> > <fence> > <method name="1"> > <device name="apcfence" port="3" > switch="0"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman/> > <fencedevices> > <fencedevice agent="fence_apc" ipaddr="10.1.1.54" > login="fence_user" name="apcfence" passwd="*****"/> > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain name="NFS Failover" ordered="1" > restricted="1"> > <failoverdomainnode name="wolverine" > priority="3"/> > <failoverdomainnode name="skunk" > priority="2"/> > <failoverdomainnode name="cottontail" > priority="1"/> > </failoverdomain> > <failoverdomain name="Cluster Failover" > ordered="0" restricted="1"> > <failoverdomainnode name="wolverine" > priority="1"/> > <failoverdomainnode name="skunk" > priority="1"/> > <failoverdomainnode name="cottontail" > priority="1"/> > </failoverdomain> > </failoverdomains> > <resources> > <clusterfs device="/dev/BIOCOMP/people" > force_unmount="1" fstype="gfs" > mountpoint="/people" name="people" options=""/> > <nfsclient name="people-client" options="rw,sync" > target="136.159.***.0/24"/> > <nfsexport name="people-export"/> > <nfsclient name="projects-client" > options="rw,sync" target="136.159.***.0/24"/> > <nfsexport name="projects-export"/> > </resources> > <service autostart="1" domain="Cluster Failover" > name="cluster NAT"> > <ip address="10.1.1.1" monitor_link="1"/> > <script file="/cluster/scripts/cluster_nat" > name="cluster NAT script"/> > </service> > <service autostart="1" domain="Cluster Failover" name="NFS > people"> > <ip address="136.159.***.***" monitor_link="1"/> > <clusterfs ref="people"> > <nfsexport ref="people-export"> > <nfsclient ref="people-client"/> > </nfsexport> > </clusterfs> > </service> > <service autostart="1" domain="Cluster Failover" name="NFS > projects"> > <ip address="136.159.***.***" monitor_link="1"/> > <clusterfs device="/dev/BIOCOMP/RT_testproject" > force_unmount="1" fstype="gfs" > mountpoint="/projects/RT_testproject" > name="RT_testproject" options=""> > <nfsexport ref="projects-export"> > <nfsclient ref="projects-client"/> > </nfsexport> > </clusterfs> > </service> > </rm> > </cluster> > > > Am I doing something wrong here? I tried looking through > /usr/share/cluster/clusterfs.sh to see where it is returning 1 from but I > can't seem to be able to debug this issue on my own. > > Thoughts, Ideas, Suggestions? > -- Axel.Thimm at ATrpms.net
Attachment:
pgplNt2ODTr9W.pgp
Description: PGP signature
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster