On 7/27/10 5:15 AM, Steven Whitehouse wrote:
Hi,
If you translate a5b67f into decimal, then that is the inode number of
the inode which is causing a problem. It looks to me as if you have too
many processes trying to access this one inode from multiple nodes.
Its not obvious from the traces that anything is actually stuck, but if
you take two traces, a few seconds or minutes apart, then it should
become more obvious whether the cluster is making progress or whether it
really is stuck,
Steve.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Steve,
As always, thanks for the reply. The cluster was, indeed, truly
stuck. I rebooted it last night to clear everything out. I never did
figure out which file was the problem. I did a find -inum, but the find
hung too. By that point the load average was up to 80 and climbing.
Any ideas on how to avoid this? Are there tunable values I need to
increase to allow more processes to access any individual inode?
Thanks!
-- scooter
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster