Ick...it appears the apps's locking mechanism is fnctl. An strace off the app is full of... fcntl64(8, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET, start=2147478526, len=1024}, 0xbffff5a0) = 0 fcntl64(8, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=2147477478, len=1}, 0xbffff4f0) = 0 ...type messages. The app itself is a really old COBOL app built on Liant's RM/Cobol -- an abstraction software similar to java which allows the same object code to run on Linux, UNIX, and Windows with very little modification through a runtime application. So, while I have access to the source for the compiled object, I don't have access to the runtime app code, which is really the thing doing all the locking. This specific testing app is opening one file with locks, but it's beating that file up. Essentially, it's going through the file and performing a series of sorts and searches, which, for the most part, would beat up the proc more than the I/O. The "real" application for the most part will not be nearly as intense, but will open probably around 100 shared files simultaneously with posix locking. Would adjusting the SHRINK_CACHE_COUNT and SHRINK_CACHE_MAX in lock_dlm.h affect this type of application? Any other tunable parameters which will help out? I'm not tied to DLM at this point...is there another mechanism which would do this equally well? As for a test app...I'm not sure I'll be able to provide that. I'll look into it, though. --Peter -----Original Message----- From: David Teigland [mailto:teigland@xxxxxxxxxx] Sent: Tuesday, April 05, 2005 8:48 PM To: Peter Shearer Cc: linux-cluster@xxxxxxxxxx Subject: Re: [Linux-cluster] LOCK_DLM Performance under Fire On Tue, Apr 05, 2005 at 05:35:01PM -0700, Peter Shearer wrote: > ext3 on local disk, the test app takes about 3 min 20 sec to complete. > ext3 on GNBD exported disk (one node only, obviously); completes in > about 3 min 35 sec. > GFS on GNBD mounted with the localflocks option; completes in 5 min 30 > sec. > GFS on GNBD mounted using LOCK_DLM with only one server mounting the fs; > completes in 50 min 45 sec. > GFS on GNBD mounted using LOCK_DLM with two servers mounting the fs; > went over 80 min and wasn't even half done. It sounds like the app is using fcntl (posix) locks, not flock(2)? If so, that's a weak spot for lock_dlm which translates posix-lock requests into multiple dlm lock operations. That said, it's possible the code may be doing some dumb things that could be fixed to improve the speed. If there are hundreds of files being locked, one simple thing to try is to increase SHRINK_CACHE_COUNT and SHRINK_CACHE_MAX in lock_dlm.h (sorry, never made them tunable through proc.) This relates to some basic caching lock_dlm does for files that are repeatedly locked/unlocked. If the app could get by with just using flock() that would certainly be much faster. Also, if you could provide the test you use or a simplified equivalent it would help. -- Dave Teigland <teigland@xxxxxxxxxx>