On Tue, Oct 19, 2004 at 01:05:54PM -0500, Derek Anderson wrote: > I've rerun the simple performance tests originally run by Daniel McNeil with > the addition of the gulm lock manager on the 2.6.8.1 kernel and GFS 6.0 on > the 2.4.21-20.EL kernel. > > Notes: > ====== > Storage: RAID Array Tornado- Model: F4 V2.0 > HBA: QLA2310 > Switch: Brocade Silkworm 3200 > Nodes: Dual Intel Xeon 2.40Ghz > 2GB memory > 100Mbs Ethernet > 2.6.8.1 Kernel/2.4.21-20.EL Kernel (with gfs 6) > GuLM: 3-node cluster, 1 external dedicated lock manager > DLM: 3-node cluster > LVM: Not used > > > tar xvf linux-2.6.8.1.tar: > -------------------------- > real user sys > gfs dlm 1 node tar 0m19.480s 0m0.474s 0m8.975s > du -s linux-2.6.8.1 (after untar): > ---------------------------------- > real user sys > gfs dlm 1 node 0m5.149s 0m0.041s 0m1.905s > Second du -s linux-2.6.8.1: > --------------------------- > real user sys > gfs dlm 1 node 0m0.341s 0m0.027s 0m0.314s I've found part of the problem by running the following tests. (I have more modest hardware: 256MB memory, Dual Pentium III 700 MHz) Here's the test I ran on just a single node: > time tar xf /tmp/linux-2.6.8.1.tar; time du -s linux-2.6.8.1/; time du -s linux-2.6.8.1/ 1. lock_nolock tar: real 1m6.859s du1: real 0m45.952s du2: real 0m1.934s 2. lock_dlm, this is the only node mounted tar: real 1m20.130s du1: real 0m52.483s du2: real 1m4.533s Notice that the problem is not the first du which looks normal compared to the nolock results, but the second du is definately bad. 3. lock_dlm, this is the only node mounted * changed lock_dlm.h DROP_LOCKS_COUNT from 10,000 to 100,000 tar: real 1m16.028s du1: real 0m48.636s du2: real 0m2.332s No more problem. Comentary: When gfs is holding over DROP_LOCKS_COUNT locks (locally), lock_dlm tells gfs to "drop locks". When gfs drops locks, it invalidates the cached data they protect. du in the linux src tree requires gfs to acquire some 16,000 locks. Since this exceeded 10,000, lock_dlm was having gfs toss the cached data from the previous du. If we raise the limit to 100,000, there's no "drop locks" callback and everything remains cached. This "drop locks" callback is a way for the lock manager to throttle things when it begins reaching its own limitations. 10,000 was picked pretty arbitrarily because there's no good way for the dlm to know when it's reaching its limitations. This is because the main limitation is free memory on remote nodes. The dlm can get into a real problem if gfs hold "too many" locks. If a gfs node fails, it's likely that some of the locks the dlm mastered on that node need to be remastered on remaining nodes. Those remaining nodes may not have enough memory to remaster all the locks -- the dlm recovery process eats up all the memory and hangs. Part of a solution would be to have gfs free a bunch of locks at this point, but that's not a near-term option. So, we're left with the tradeoff: favoring performance and increasing risk of too little memory for recovery or v.v. Given my machines and the test I was running, 10,000 solved the recovery problem. 256MB is obviously behind the times making a default of 10,000 probably too low. I'll increase the constant and make it configurable through /proc. -- Dave Teigland <teigland@xxxxxxxxxx>