RE: [Linux-cluster] LOCK_DLM Performance under Fire

"Peter Shearer" <pshearer@xxxxxxxxxxxxxx> · Thu, 7 Apr 2005 09:40:34 -0700

Yes, the idea was to parallelize the app across multiple machines
sharing a common SAN infrastructure (hopefully iSCSI; if not, then GNBD
in the interim).  There is no central control daemon or database
manager; each instance of the app does its own record locking and such,
so it really doesn't matter where the data resides, as long as all the
clients are able to touch the same files.  Therefore, distributed locks
are really important.

I had suspected that the locking subsys was causing the slowdowns, so
that's why I did a test with the localflocks -- it's not as fast as
ext3, but works fine with only one server involved.  Of course, that's
not going to work for this application.  :)

--Peter

-----Original Message-----
From: David Teigland [mailto:teigland@xxxxxxxxxx] 
Sent: Wednesday, April 06, 2005 7:31 PM
To: Peter Shearer
Cc: linux-cluster@xxxxxxxxxx
Subject: Re: [Linux-cluster] LOCK_DLM Performance under Fire

On Wed, Apr 06, 2005 at 12:01:02PM -0700, Peter Shearer wrote:

> The app itself is a really old COBOL app built on Liant's RM/Cobol --
an
> abstraction software similar to java which allows the same object code
> to run on Linux, UNIX, and Windows with very little modification
through
> a runtime application.  So, while I have access to the source for the
> compiled object, I don't have access to the runtime app code, which is
> really the thing doing all the locking.
> 
> This specific testing app is opening one file with locks, but it's
> beating that file up.  Essentially, it's going through the file and
> performing a series of sorts and searches, which, for the most part,
> would beat up the proc more than the I/O.  The "real" application for
> the most part will not be nearly as intense, but will open probably
> around 100 shared files simultaneously with posix locking.  Would
> adjusting the SHRINK_CACHE_COUNT and SHRINK_CACHE_MAX in lock_dlm.h
> affect this type of application?  Any other tunable parameters which
> will help out?  I'm not tied to DLM at this point...is there another
> mechanism which would do this equally well?

Taking a step back, is this a parallelized/clusterized application?
i.e. will it be running concurrently on different machines with the
data shared using GFS?  If so, then the distributed fcntl locks are
critical.  If not, it would be safe to use the localflocks mount option
which means fcntl locks are no longer translated to distributed locks.

-- 
Dave Teigland  <teigland@xxxxxxxxxx>