We're using GFS-6.0.0-7.1 Not the latest patch, I realize, and if that will fix things, that'd be ideal, I think. We're also in a 5-node situation, with three servers, and in fact our behaviour appears to be almost identical. It -does- eventually come back for us after I kill the rsync process, so it appears to be flushing a buffer of some sort. Regardless, it's not really acceptable behaviour when you've got a 32node compute cluster behind one of the GFS nodes and you have researchers that need to move hundreds of gigs of data into the file system and -can't- because of this. -- Jerry Gilyeat, RHCE Systems Administrator Molecular Microbiology and Immunology Johns Hopkins Bloomberg School of Public Health -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx on behalf of Kovacs, Corey J. Sent: Tue 5/31/2005 2:38 PM To: linux clustering Subject: RE: Question :) Jerry, is this problem with the "current" supported version of GFS? If so, what version are you running? I am having a similar problem with a 5 node cluster with 3 nodes serving as lock managers. If I rsync large ammounts of data (0.5TB) to a node serving as a lock manager and mounting the FS, things croak pretty quick. If I rsync to a node that is NOT a lock manager, it takes longer but eventually locks up their as well. Although at times, it will come back. when we do out rsync, the gfs_scand and lock_gulmd go crazy. In the instance where the fs comes back, they continue to have high cpu utilization. I don't think this is "a fact of life" that anyone needs to live with by the way, there has to be a reason for this. I can't believe for a minute that you and I are the only ones experienceing this. Corey ________________________________ From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Gerald G. Gilyeat Sent: Tuesday, May 31, 2005 2:06 PM To: linux-cluster@xxxxxxxxxx Subject: Question :) First - thanks for the help the last time I poked my pointy little head in here. Things have been -much- more stable since we bumped the lock limit to 2097152 ;) However, we're still running into the occasional "glitch" where it seems like a single process is locking up -all- disk access on us, until it completes its operation. Specifically, we see this when folks are doing rsyncs of large amounts of data (one of my faculty has been trying to copy over a couple thousand 16MB files). Even piping tar through ssh (from target machine, ssh user@host "cd /data/dir/path; tar -cpsf -" | tar -xpsf -) results in similar behaviour. Is this tunable, or simply a fact of life that we're simply going to have to live with? it only occurs with big, or long, writes. Reads aren't a problem (it just takes 14 hours to dump 1.5TB to tape...) Thanks! -- Jerry Gilyeat, RHCE Systems Administrator Molecular Microbiology and Immunology Johns Hopkins Bloomberg School of Public Health
<<winmail.dat>>
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster