Steve, Thanks for the informative, and detailed response -- it really helps to understand what might be happening. We're not mounting with noatime, and it sounds like that would be a good first step. Thanks! -- scooter Steven Whitehouse wrote: Hi, On Fri, 2009-10-09 at 10:57 -0700, Scooter Morris wrote:Steve, Thanks for the prompt reply. Like Kaerka, I'm running on large-memory servers and decreasing demote_secs from 300 to 20 resulted in significant performance improvements because locks get freed much more quickly (I assume), resulting in much better response. It could certainly be that changing demote_secs was a workaround for a different bug that has now been fixed, which would be great. I'll try some tests today and see how "rm -rf" on a large directory behaves. -- scooterThe question though, is why that should result in a better response. It doesn't really make sense, since the caching of the "locks" (really caching of data and metadata controlled by a lock) should improve the performance due to more time to write out the dirty data. Doing an "rm -fr" is also a very different workload to that of reading all the files in the filesystem once (for backup purposes for example) since the "rm -fr" requires writing to the fs and the backup process doesn't do any writing. How long it takes to remove a file also depends to a large extent on its size. In both cases, however it would improve performance if you could arrange to remove, or read inodes in inode number order. Both GFS and GFS2 return inodes from getdents64 (readdir) in a pseudo-random order based on the hash of the filename. You can gain a lot of performance if these results are sorted before they are scanned. Ideally we'd return them from the fs in sorted order. Unfortunately a design decision which was made a long time ago which, in combination with the design of the Linux VFS prevents us from doing that. If there is a problem with a node caching the whole filesystem after it has been scanned, then it is still possible to solve this issue: echo 3 > /proc/sys/vm/drop_caches I guess I should also point out that it is a good idea to mount with the noatime mount option if there is going to be a read-only scan of the complete filesystem on a regular basis, since that will prevent that becoming a "write to every inode" scan. That will also make a big performance difference. Note that its ok (in recent kernels) to mount a GFS2 filesystem more than once with different atime flags (using bind mounts) in case you have an application which requires atime, but you want to avoid it when running a back up. There is also /proc/sys/vm/vfs_cache_pressure as well, which may help optimise your workload. ... and if all that fails, then the next thing to do is to use blktrace/seekwatcher to find out whats really going on, on the disk and send the results so that we can have a look and see if we can improve the disk I/O. Better still if you can combine that with a trace from the gfs2 tracepoints so we can see the locking at the same time, Steve.Kaerka Phillips wrote:If in gfs2 glocks are purged based upon memory constraints, what happens if it is run on a box with large amounts of memory? i.e. RHEL5.x with 128gb ram? We ended up having to move away from GFS2 due to serious performance issues with this exact setup, and our performance issues were largely centered around commands like ls or rm against gfs2 filesystems with large directory structures and millions of files in them. In our case, something as simple as copying a whole filesystem to another filesystem would cause a load avg of 50 or more, and would take 8+ hours to complete. The same thing on NFS or ext3 would take usually 1 to 2 hours. Netbackup of 10 of those filesystems took ~40 hours to complete, so we were getting maybe 1 good backup per week, and in some cases the backup itself caused cluster crash. We are still using our GFS1 clusters, since as long as their network is stable, their performance is very good, but we are phasing out most of our GFS2 clusters to NFS instead. On Fri, Oct 9, 2009 at 1:01 PM, Steven Whitehouse <swhiteho@xxxxxxxxxx> wrote: Hi, On Fri, 2009-10-09 at 09:55 -0700, Scooter Morris wrote: > Hi all, > On RHEL 5.3/5.4(?) we had changed the value of demote_secs to > significantly improve the performance of our gfs2 filesystem for certain > tasks (notably rm -r on large directories). I recently noticed that > that tuning value is no longer available (part of a recent update, or > part of 5.4?). Can someone tell me what, if anything replaces this? Is > it now a mount option, or is there some other way to tune this value? > > Thanks in advance. > > -- scooter > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster Nothing replaces it. The glocks are disposed of automatically on an LRU basis when there is enough memory pressure to require it. You can alter the amount of memory pressure on the VFS caches (including the glocks) but not specifically the glocks themselves. The idea is that is should be self-tuning now, adjusting itself to the conditions prevailing at the time. If there are any remaining performance issues though, we'd like to know so that they can be addressed, Steve. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster