Re: [Linux-cluster] Directory lockups?

David Teigland <teigland@xxxxxxxxxx> · Thu, 16 Sep 2004 13:46:39 +0800

On Wed, Sep 15, 2004 at 10:55:02PM +0200, Lazar Obradovic wrote:
> It happened again today, and I got around 80 queud processes waiting to
> write into same file. All processes were in "D" state when looked from
> 'ps' and they all blocked whole directory where file is (ls into that
> dir would block too). 

Is there a test or application you're running that we could try ourselves?

> Now, node just recovered itself, but that directory was unavailable for
> almost an hour and a half!

In addition to Ken's suggestion ("ps aux" and "gfs_tool lockdump
/mountpoint" from each node), you could provide "cat
/proc/cluster/lock_dlm_debug" from each node.

> Do deadlocktimeout and lock_timeout (in /proc/cluster/config/dlm) have
> anything to do with this and are they configurable? 

They are unrelated to gfs.

> Can someone shed a light on /proc interface, just to know what's where?
> This could also go into usage.txt or even separate file... 

I don't think any of them would be useful.  It's simply our habit to
define any "constant" this way.

buffer_size - network message size used by the dlm
dirtbl_size, lkbtbl_size, rsbtbl_size - hash table sizes
lock_timeout - max time we'll wait for a reply for a remote request
  (not used for gfs locks)
deadlocktime - max time a request will wait to be granted
  (not used for gfs locks)
recover_timer - while waiting for certain conditions during recovery,
  this is the interval between checks
tcp_port - used for dlm communication
max_connections - max number of network connections the dlm will make

-- 
Dave Teigland  <teigland@xxxxxxxxxx>