On Wed, 9 Apr 2008, Wendy Cheng wrote: > > What led me to suspect clashing in the hash (or some other lock-creating > > issue) was the simple test I made on our five node cluster: on one node I > > ran > > > > find /gfs -type f -exec cat {} > /dev/null \; > > > > and on another one just started an editor, naming a non-existent file. > > It took multiple seconds while the editor "opened" the file. What else than > > creating the lock could delay the process so long? > > > > Not knowing how "find" is implemented, I would guess this is caused by > directory locks. Creating a file needs a directory lock. Your exclusive write > lock (file create) can't be granted until the "find" releases the directory > lock. It doesn't look like a lock query performance issue to me. As /gfs is a large directory structure with hundreds of user home directories, somehow I don't think I could pick the same directory which was just processed by "find". But this is a good clue to what might bite us most! Our GFS cluster is an almost mail-only cluster for users with Maildir. When the users experience temporary hangups for several seconds (even when writing a new mail), it might be due to the concurrent scanning for a new mail on one node by the MUA and the delivery to the Maildir in another node by the MTA. What is really strange (and distrurbing) that such "hangups" can take 10-20 seconds which is just too much for the users. In order to look at the possible tuning options and the side effects, I list what I have learned so far: - Increasing glock_purge (percent, default 0) helps to trim back the unused glocks by gfs_scand itself. Otherwise glocks can accumulate and gfs_scand eats more and more time at scanning the larger and larger table of glocks. - gfs_scand wakes up every scand_secs (default 5s) to scan the glocks, looking for work to do. By increasing scand_secs one can lessen the load produced by gfs_scand, but it'll hurt because flushing data can be delayed. - Decreasing demote_secs (seconds, default 300) helps to flush cached data more often by moving write locks into less restricted states. Flushing often helps to avoid burstiness *and* to prolong another nodes' lock access. Question is, what are the side effects of small demote_secs values? (Probably there is no much point to choose smaller demote_secs value than scand_secs.) Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'. > > But 'flushing when releasing glock' looks as a side effect. I mean, isn't > > there a more direct way to control the flushing? > > To make long story short, I did submit a direct cache flush patch first, > instead of this final version of lock trimming patch. Unfortunately, it was > *rejected*. I see. Another question, just out of curiosity: why don't you use kernel timers for every glock instead of gfs_scand? The hash bucket id of the glock should be added to struct gfs_glock, but the timer function could be almost identical with scan_glock. As far as I see the only drawback were that it'd be equivalent with 'glock_purge = 100' and it'd be tricky to emulate glock_purge != 100 settings. Best regards, Jozsef -- E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster