Chris St. Pierre wrote:
I have a RHEL 4 NFS server that shares out three volumes, all
read-only. One goes to another Linux box, and the other two go to a
Solaris 9 machine. One of the volumes mounted on the Solaris boxes is
having bewildering problems.
Every night, two processes run on the server that cause these
problems. The first is:
/sbin/quotacheck -fguma
The second is AIDE, a Tripwire replacement. When either of these
processes runs, semi-random files semi-disappear from the client. The
files are always in the same directories, but different ones disappear
on different days. The symptoms are always the same: running 'ls'
will show the files, but running 'ls -lAF' (or anything that requires
running stat() on them) fails with "File not found." Opening them
also fails. To solve this problem, I have to touch the file *on the
client*; of course, it gives an error that it can't create the file in
question, but after that, everything works.
The only common thread I can think of between quotacheck and AIDE is
that both stat a very large number of files on the server. That said,
AIDE is not configured to check any of the volumes that are shared via
NFS. I also wrote a quick Perl script to recurse into a directory and
stat all the files in it, but that doesn't break the NFS shares,
either.
I initially thought the problems where related to the firewall on my
server, so I turned it off. (There is no firewall on the client.)
Based on suggestions from fellow S.A., I tried adding actimeo=0 and
forcedirectio to the mount options on the client, but that didn't
solve anything. My users are getting very antsy, to say the least.
Does anyone have any ideas? (Aside from cosmic rays, I mean.) Here's
my /etc/exports on 'huxley', the server:
/webdirs/univ job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
/webdirs/students students.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
/webdirs/faculty job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
And on 'job', the client, the corresponding lines from /etc/vfstab:
huxley:/webdirs/univ - /www_misc nfs - yes soft,bg,actimeo=0,forcedirectio
huxley:/webdirs/faculty - /web/people nfs - yes soft,bg
It bears repeating that only one of the volumes (/webdirs/univ,
mounted on /www_misc) is having problems; the other volume shared
between the two servers is just fine. Other NFS mounts on the client
and shares from the server are similarly fine. In fact, most of the
NFS share in question is fine -- it's just two directories that
consistently lose files whenever quotacheck or AIDE is run.
Any ideas? I'm up against a brick wall on this one. Thanks!
Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University
This is just wild speculation on my part...
Could it be that the job you are running is placing such a heavy load on the
server that NFS requests from the client are timing out? This in turn is being
cached on the client, causing the resulting "File not found" errors? I notice
you have actimeo=0, could this be the culprit - does that mean cache forever, or
never cache? The man page isn't forthcoming on that.
--
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw@xxxxxxxxxxxx
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list