We also have some crashes when writting very large files, 5GB or so, and it seems the problem occurs when we hit the GFS cache limit, where the machine memory is 4GB (Dual Opteron).
Is there a way to tune the GFS cache to use less memory, let say a maximum 512MB, so we can debug the problem better?
And it is either the remote GFS cache or GNBD, since we can write 8GB or larger
files when GFS is mounted locally, ie, when we do the tests in the same machine
that exports the GFS device, via GNBD, to the rest of the nodes.
Marcelo
Patrick Caulfield wrote:
On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
My 3 node cluster ran tests for 53 hours before hitting a problem.
Attached is a patch to set the CMAN process to run at realtime priority, I'm not sure if that's the right thing to do or not to be honest.
Neither am I sure whether your 48-53 hours is significant - it's possible that memory may be an issue (only guessing but GFS caches locks like crazy, it may be worth cutting this down a bit by tweaking
/proc/cluster/lock_dlm/drop_count and/or /proc/cluster/lock_dlm/drop_period
otherwise, the only way were gpoing to get to the bottom of this is to enable "DEBUG_MEMB" in cman and see what it thinks is going on when the node is kicked out of the cluster.
patrick
------------------------------------------------------------------------
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster