I just took over a couple of clusters for a sysadmin that left the company. Unfortunately, the hand-off was less than informative. <sigh> So, I've got an old linux cluster, still well-used, with a PVFS filesystem mounted at /work. I'm new to clustering, and I sure as hell don't know much about it, but I've got a sick puppy here. All points to the PVFS filesystem.
lsof: WARNING: can't stat() pvfs file system /work
Output information may be incomplete.
In /var/log/messages:
Oct 3 13:51:34 elvis PAM_pwdb[24431]: (su) session opened for user deb_r by deb(uid=2626)
Oct 3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300
0/pvfs-meta
Oct 3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300
0/pvfs-meta/manaa/DFTBNEW
Oct 3 14:16:48 elvis kernel: (./ll_pvfs.c, 409): ll_pvfs_statfs failed on downcall for 192.168.1.102:3000
/pvfs-meta
Oct 3 14:16:elvis kernel: (./inode.c, 321): pvfs_statfs failed
So the
Linux elvis 2.2.19-13.beosmp #1 SMP Tue Aug 21 20:04:44 EDT 2001 i686 unknown
Red Hat Linux release 6.2 (Zoot)
Can't access /work from the master or any nodes,
elvis [49#] ls /work
ls: /work: Too many open files
I ran a script in /usr/bin called pvfs_client_stop.sh - which killed all the pvfs daemons, etc
#!/bin/tcsh
# Phil Carns
# pcarns@xxxxxxxxxxxxxxxxxx
#
# This is an example script for how to get Scyld Beowulf cluster nodes
# to mount a PVFS file system.
set PVFSD = "/usr/sbin/pvfsd"
set PVFSMOD = "pvfs"
set PVFS_CLIENT_MOUNT_DIR = "/work"
set MOUNT_PVFS = "/sbin/mount.pvfs"
# unmount the file system locally and on all of the slave nodes
/bin/umount $PVFS_CLIENT_MOUNT_DIR
bpsh -pad /bin/umount $PVFS_CLIENT_MOUNT_DIR
# kill all of the pvfsd client daemons
/usr/bin/killall pvfsd
# remove the pvfs module on the local and the slave nodes
/sbin/rmmod $PVFSMOD
bpsh -pad /sbin/rmmod $PVFSMOD
Then I ran pvfs_client_start.sh /work, which seemed to work, except it never exited...
#!/bin/tcsh
# Phil Carns
# pcarns@xxxxxxxxxxxxxxxxxx
#
# This is an example script for how to get Scyld Beowulf cluster nodes
# to mount a PVFS file system.
set PVFSD = "/usr/sbin/pvfsd"
set PVFSMOD = "pvfs"
set PVFS_CLIENT_MOUNT_DIR = "/work"
set MOUNT_PVFS = "/sbin/mount.pvfs"
set PVFS_META_DIR = `bpctl -M -a`:$1
if $1 == "" then
echo "usage: pvfs_client_start.sh <meta dir>"
echo "(Causes every machine in the cluster to mount the PVFS file system)"
exit -1
endif
# insert the pvfs module on the local and slave nodes
/sbin/modprobe $PVFSMOD
bpsh -pad /sbin/modprobe $PVFSMOD
# start the pvfsd client daemon on the local and slave nodes
$PVFSD
bpsh -pad $PVFSD
# actually mount the file system locally and on all of the slave nodes
$MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR
bpsh -pad $MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR
This seemed to work (well, it restarted daemons and such, but I still can't get into /work and getting resource busy and:
mount.pvfs: Device or resource busy
mount.pvfs: server 192.168.1.102 alive, but mount failed (invalid metadata directory name?)
Comments? Useful ideas? A good joke???
dave
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster