Jaap Dijkshoorn wrote:
It looks like it!
I have aksed the user who is having this problem, what exactly is
happening with those files during his job. I hope this will give us a
clue in what ways those files are touched and/or deleted etc.
All files are read/write by the users through NFS. But that strange
thing is that on 4 of the 5 servers the files are still available, on
GFS as well on the clients through NFS.
thanks already for the effort. I hope we can tackle this bug!
Best Regards,
Jaap
Hi Jaap,
Soon after I sent the last email, I did recreate the problem here in our
lab,
though it was after several days of trying. That's good: It means the U4 is
very stable, and it means I can probably work on the problem without the
need for further information from people in the field. I did just
update the
bugzilla, but here's what I know so far:
This is hard to explain, so let me simplify by calling "A" the cluster node
that shows the files correctly, and "B" the cluster node that say the files
are missing. Let's further say that an example "missing" file is:
/mnt/gfs/subdir/xyz. So "ls /mnt/gfs/subdir/xyz" from "A" shows the
file correctly, while the same command from "B" produces
"No such file or directory".
The biggest clue I've found today is this:
It looks as if "B" somehow seems to have the wrong inode cached for
"subdir". In other words, a stat command run on the directory
"/mnt/gfs/subdir"
shows the wrong directory inode (possibly a deleted subdirectory?) on
"B" whereas "A" has the correct inode for "subdir" with the same stat
command. I'm not sure yet if this incorrect cached inode is coming from
GFS,
or whether it's in the Linux vfs. I'm still investigating.
Please update the bugzilla if you get more information. In the meanwhile,
I'll continue working on the problem and I'll keep the bugzilla up to date
when I find out more.
Regards,
Bob Peterson
Red Hat Cluster Suite
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster