Re: GFS + CORAID Performance Problem

Wendy Cheng <wcheng@xxxxxxxxxx> · Sun, 10 Dec 2006 22:53:44 -0500

Wendy Cheng wrote:

bigendian+gfs@xxxxxxxxx wrote:

I've just set up a new two-node GFS cluster on a CORAID sr1520 
ATA-over-Ethernet.  My nodes are each quad dual-core Opteron CPU 
systems with 32GB RAM each.  The CORAID unit exports a 1.6TB block 
device that I have a GFS file system on.

I seem to be having performance issues where certain read system 
calls take up to three seconds to complete.  My test app is 
bonnie++, and the slow-downs appear to be happen in the "Rewriting" 
portion of the test, though I'm not sure if this is exclusive.  If I 
watch top and iostat for the device in question, I see activity on 
the device, then long (up to three second) periods of no apparent 
I/O.  During the periods of no I/O the bonnie++ process is blocked 
on disk I/O, so it seems that the system it trying to do something.  
Network traces seem to show that the host machine is not waiting on 
the RAID array, and the packet following the dead-period seems to 
always be sent from the host to the coraid device.  Unfortunately, I 
don't know how to dig in any deeper to figure out what the problem is.

Wait ... sorry, I didn't read carefully... now I see that 3 seconds in 
the strace. That doesn't look like a bonnie++ issue.... Does bonnie++ 
run on single node ? Or you dispatch them on both nodes (on different 
directories) ? This is more complicated than that I originally 
expected (since this is a network block device ?). Need to think how 
to catch the culprit... could be memory issue though. Could you try to 
run bonnie++ on 4G of memory to see how whether you can see there are 
3 seconds read delay ?

Hit send key too soon ... my words are cluttered. Note that reducing 
memory from 32G to 4G may sound funny but there are VM issues behind 
this. So it is a quick and dirty experiment.

-- Wendy

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster