GFS tuning for combined batch / interactive use

Kevin Maguire <kmaguire@xxxxxxx> · Thu, 16 Dec 2010 00:47:23 +0100 (CET)

Hi

We are running a 20 node cluster, using Scientific Linux 5.3, with a GFS 
shared filesystem hosted on our SAN. Cluster nodes are dual core units 
with 4 GB of RAM, and a standard Qlogic FC HBA.

Most of the 20 nodes form a batch-processing cluster, and our users are 
happy enough with the performance they get, but some nodes are used 
interactively. When the filesystem is under stress due to large batch 
processing jobs running on other nodes, interactive use becomes very slow 
and painful.

Is there any tuning I (the sysadmin) can do that might help in this 
situation?  Would a migration to gfs2 make a difference? Are all nodes 
treated identically, or can hosts mounting the filesystem have any kind of 
priority/QoS? Which tools could I use to track down any bottlenecks?

In theory we could update kernel+gfs bits to a later release, though we 
saw the same issues when using the same cluster with a SL4.x stack, but 
for now it's

kernel-2.6.18-128.1.1.el5.i686
kmod-gfs-0.1.31-3.el5.i686
gfs-utils-0.1.20-7.el5.i386
gfs2-utils-0.1.53-1.el5_3.1.i386

Thanks for any help/suggestions,
Kevin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster