On Thursday 21 October 2004 08:06, David Teigland wrote: > When gfs is holding over DROP_LOCKS_COUNT locks (locally), lock_dlm > tells gfs to "drop locks". When gfs drops locks, it invalidates the > cached data they protect. du in the linux src tree requires gfs to > acquire some 16,000 locks. Since this exceeded 10,000, lock_dlm was > having gfs toss the cached data from the previous du. If we raise > the limit to 100,000, there's no "drop locks" callback and everything > remains cached. > > This "drop locks" callback is a way for the lock manager to throttle > things when it begins reaching its own limitations. 10,000 was > picked pretty arbitrarily because there's no good way for the dlm to > know when it's reaching its limitations. This is because the main > limitation is free memory on remote nodes. > > The dlm can get into a real problem if gfs hold "too many" locks. If > a gfs node fails, it's likely that some of the locks the dlm mastered > on that node need to be remastered on remaining nodes. Those > remaining nodes may not have enough memory to remaster all the locks > -- the dlm recovery process eats up all the memory and hangs. You need to maintain a memory pool for locks on each node and expand the pool as the number of locks increases. Global balancing is needed to accommodate remastering, e.g., enforce that the sum of free lock pool on the cluster is always enough to remaster at least the N heaviest lock users. With this approach, the hard limit on number of locks can be a large fraction of total installed memory. If we teach the VM how to shrink the lock pool then no hard limit is needed at all, as for most kernel resources. You are going to hit PF_MEMALLOC problems too because the VMM doesn't know anything about kernel gdlm daemons, so they never run in PF_MEMALLOC mode. No matter how much pool the daemon has for its own kmallocs, the kernel subsystems it calls (including networking) don't know about your pools and can't use them. Every allocation they do is a deadlock risk. All by way of saying that userspace isn't the only victim of memory inversion, a definitive solution is needed for both kernel and userspace. > Part of a solution would be to have gfs free a bunch of locks at this > point, but that's not a near-term option. So, we're left with the > tradeoff: favoring performance and increasing risk of too little > memory for recovery or v.v. How about increasing performance and reducing risk at the same time? Regards, Daniel