On Wed, Sep 15, 2004 at 10:55:02PM +0200, Lazar Obradovic wrote: > It happened again today, and I got around 80 queud processes waiting to > write into same file. All processes were in "D" state when looked from > 'ps' and they all blocked whole directory where file is (ls into that > dir would block too). Is there a test or application you're running that we could try ourselves? > Now, node just recovered itself, but that directory was unavailable for > almost an hour and a half! In addition to Ken's suggestion ("ps aux" and "gfs_tool lockdump /mountpoint" from each node), you could provide "cat /proc/cluster/lock_dlm_debug" from each node. > Do deadlocktimeout and lock_timeout (in /proc/cluster/config/dlm) have > anything to do with this and are they configurable? They are unrelated to gfs. > Can someone shed a light on /proc interface, just to know what's where? > This could also go into usage.txt or even separate file... I don't think any of them would be useful. It's simply our habit to define any "constant" this way. buffer_size - network message size used by the dlm dirtbl_size, lkbtbl_size, rsbtbl_size - hash table sizes lock_timeout - max time we'll wait for a reply for a remote request (not used for gfs locks) deadlocktime - max time a request will wait to be granted (not used for gfs locks) recover_timer - while waiting for certain conditions during recovery, this is the interval between checks tcp_port - used for dlm communication max_connections - max number of network connections the dlm will make -- Dave Teigland <teigland@xxxxxxxxxx>