Unfortunately, I do not believe that the server was doing much of any thing at the time. There were a few users, mostly reading email, and not using the file system that had the problem. The partition in question is used for mail lists and a dumping ground for backups for 6 other servers. The backups were not running at the time the first gfs_releasepage() message was logged. The mail lists are just test lists and not in use yet. The backups transfer about 10GB of data in 12 to 15 files between 3am and 5am every day. The backups are transfered by scp (the only path through a firewall). The backups that night ran without any problems, both the copy from the remote servers and a copy of that file system to tape.
If/when it happens again, I will try to have a better idea of what was going on at the time.
The server in question had been up for over 30 days when the problem started.
Thank you
Matt
On Thu, 2005-12-15 at 14:24, Andrew C. Dingman wrote:
On Mon, 2005-12-12 at 14:51 -0700, Matt Brookover wrote: > This looks like a similar problem to the one described in bugzilla > 160409. It does not look like there ever was a solution. > It does look similar, and there was no solution. We were never able to re-produce the problem by any method other than putting it into production. I think the theory we ended up with was that there was some sort of lock contention problem, possibly having to do with the network here. It was just a theory, though. We never managed to prove anything. Do you know what you did to trigger it? I assume something other than 300 people running jBase applications?
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster