PVFS going Wild

dave first <linux4dave@xxxxxxxxx> · Mon, 3 Oct 2005 14:49:10 -0700

Hey Guys,

I just took over a couple of clusters for a sysadmin that left the
company.  Unfortunately, the hand-off was less than
informative.  <sigh>  So, I've got an old linux
cluster, still well-used, with a PVFS filesystem mounted at
/work.  I'm new to clustering, and I sure as hell don't know much
about it, but I've got a sick puppy here.  All points to the PVFS
filesystem.  

lsof: WARNING: can't stat() pvfs file system /work

      Output information may be incomplete.

In /var/log/messages:

Oct  3 13:51:34 elvis PAM_pwdb[24431]: (su) session opened for user deb_r by deb(uid=2626)

Oct  3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300

0/pvfs-meta

Oct  3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300

0/pvfs-meta/manaa/DFTBNEW

Oct  3 14:16:48 elvis kernel: (./ll_pvfs.c, 409): ll_pvfs_statfs failed on downcall for 192.168.1.102:3000

/pvfs-meta

Oct  3 14:16:elvis kernel: (./inode.c, 321): pvfs_statfs failed

So the

Linux elvis 2.2.19-13.beosmp #1 SMP Tue Aug 21 20:04:44 EDT 2001 i686 unknown

Red Hat Linux release 6.2 (Zoot)

Can't access /work from the master or any nodes,

elvis [49#] ls /work

ls: /work: Too many open files

I ran a script in /usr/bin called pvfs_client_stop.sh - which killed all the pvfs daemons, etc

#!/bin/tcsh

# Phil Carns

# pcarns@xxxxxxxxxxxxxxxxxx 

#

# This is an example script for how to get Scyld Beowulf cluster nodes

# to mount a PVFS file system.

set PVFSD = "/usr/sbin/pvfsd"

set PVFSMOD = "pvfs"

set PVFS_CLIENT_MOUNT_DIR = "/work"

set MOUNT_PVFS = "/sbin/mount.pvfs"

# unmount the file system locally and on all of the slave nodes

/bin/umount $PVFS_CLIENT_MOUNT_DIR

bpsh -pad /bin/umount $PVFS_CLIENT_MOUNT_DIR

# kill all of the  pvfsd client daemons

/usr/bin/killall pvfsd 

# remove the pvfs module on the local and the slave nodes

/sbin/rmmod $PVFSMOD

bpsh -pad /sbin/rmmod $PVFSMOD

Then I ran pvfs_client_start.sh /work, which seemed to work, except it never exited...

#!/bin/tcsh

# Phil Carns

# pcarns@xxxxxxxxxxxxxxxxxx 

#

# This is an example script for how to get Scyld Beowulf cluster nodes

# to mount a PVFS file system.

set PVFSD = "/usr/sbin/pvfsd"

set PVFSMOD = "pvfs"

set PVFS_CLIENT_MOUNT_DIR = "/work"

set MOUNT_PVFS = "/sbin/mount.pvfs"

set PVFS_META_DIR = `bpctl -M -a`:$1

if $1 == "" then

        echo "usage: pvfs_client_start.sh <meta dir>"

        echo "(Causes every machine in the cluster to mount the PVFS file system)"

        exit -1

endif

# insert the pvfs module on the local and slave nodes

/sbin/modprobe $PVFSMOD

bpsh -pad /sbin/modprobe $PVFSMOD

# start the pvfsd client daemon on the local and slave nodes

$PVFSD

bpsh -pad $PVFSD

# actually mount the file system locally and on all of the slave nodes

$MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR

bpsh -pad $MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR

This seemed to work (well, it restarted daemons and such, but I still can't get into /work and getting resource busy and:

mount.pvfs: Device or resource busy

mount.pvfs: server 192.168.1.102 alive, but mount failed (invalid metadata directory name?)

Comments?  Useful ideas?  A good joke???

dave

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster