[Linux-cluster] LOCK_DLM Performance under Fire

"Peter Shearer" <pshearer@xxxxxxxxxxxxxx> · Tue, 5 Apr 2005 17:35:01 -0700

Title: Message

Hi, Everyone 
--

I've been playing 
around with RHEL 4 and GFS from the tar files (not CVS) on three OptiPlex 
GX280 workstations using hyperthreading, SATA drives, and GNBD for sharing 
over a 1Gb network (dual NICs per machine).  I'm exploring moving a legacy 
file-based COBOL application/database over to Linux on a bunch of smaller boxes 
vs its current home of a quad proc AIX machine.  I have a test application 
which basically does applies a bunch of file and record locks on and 
within files along with some processor intense sorting algorithms to 
stress test the power of the solution.  I'm running into some serious 
performance discrepancies of which I hope someone can help me make sense.  
Here's what I'm running into when I test this app on different file 
systems:

ext3 on local disk, 
the test app takes about 3 min 20 sec to complete.
ext3 on GNBD 
exported disk (one node only, obviously); completes in about 3 min 35 
sec.
GFS on 
GNBD mounted with the localflocks option; completes in 5 min 30 
sec.
GFS on GNBD mounted 
using LOCK_DLM with only one server mounting the fs; completes in 50 min 45 
sec.
GFS on GNBD mounted 
using LOCK_DLM with two servers mounting the fs; went over 80 min and wasn't 
even half done.
GFS on GNBD mounted 
using LOCK_GULM...don't want to go there; I left it running for over 2 hrs and 
it was worse off than the two servers using LOCK_DLM.  
:)

The test app mostly 
does a whole lot of file & record level locking -- not a lot of file 
transfer from the source disk to the memory of the local server.  iostat on 
the client and server both show that the transfer rate of data on and off the 
hard disk is only at about 300kBs.  top shows that the cpu on the 
client is being beat up as the dlm_astd, lock_dlm1, and lock_dlm2 
are taking on average 50% - 60% of the proc (30%, 15%, 15%) and my test app is 
taking up the rest.  When it's running on ext3 or GFS mounted with 
localflocks, there isn't this problem at all -- the test app goes to 99% of cpu; 
hence the faster completion times.  I have isolated the data paths so that 
the GNBD data is running over one NIC and the rest of the cluster data is on the 
second NIC in these computers.

Anyone have some 
ideas on how to tune this?  Would exporting the GNBD file system with 
caching enabled help as I'm not using multiple GNBD servers, just multiple GNBD 
clients?  Other options?  Am I just way off base 
here?

Thanks!
________________________________________
Peter Shearer
A+, MCSE, MCSE: Security, 
CCNA
IT Network Engineer
Lumbermens