Hi Jeff, I chose the 2G directory for testing since it is the directory with the largest number of files in our directory tree. [testuser@buildmgmt-000 testdir]$ i=0;for FILE in `find ./ -type f`; do ((i+=1)); done; echo $i 64423 [testuser@buildmgmt-000 testdir]$ cd ../main0/ [testuser@buildmgmt-000 main0]$ i=0;for FILE in `find ./ -type f`; do ((i+=1)); done; echo $i 164812 It does seem that the time is a linear function of the number of files since testdir takes around 1/3 of the time as main0, however the caching speedup is not a linear function since the percentage improvement on subsequent runs is much higher on the smaller directory. Increasing demote_secs did not seem to have an appreciable effect. The du command is a simplification of the use case. Our developers run scripts which make tags in source code directories which require stat'ing the files. Also, they use integrated development environments which perform autocompletion of filenames etc. so when editing a file they literally have to go have a coffee and come back 5 mins later after their entire environment unfreezes. We've had similar performance problems in the past with other recursive commands such as rm -r. Some of this has been resolved by piping file lists through xargs but while it's possible for us to modify our internal scripts we can't modify 3rd party software. What I am looking for is a cache speedup on the large directory that is proportional to the speedup on the smaller directory and I believe that would likely resolve our issues. I'm not sure why we're not seeing the same speedup and can only surmise that there is a limitation on the amount of information that can be cached. I thought that increasing the reclaim_limit might work but so far I can't see any appreciable effect. Thanks, Peter ~ > -----Original Message----- > From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] > On Behalf Of Peter Schobel > Sent: Wednesday, August 11, 2010 2:04 PM > To: linux clustering > Subject: How does caching work in GFS1? > > I am having an issue with a GFS1 cluster in some use cases. Mainly, > running du on a directory takes an unusually long time. I have the > filesystem mounted with noatime and nodiratime statfs_fast is turned > on. Running du on a 2G directory takes about 2 minutes and each > subsequent run took about the same amount of time. A stat() call over GFS is slow, period. How many files are in the 2GB directory? I would expect the time to be a linear function of the number of files, not the file sizes. The problem with du isn't that it's reading the directory (which is quite fast) but that it needs to stat() each file and directory it finds in order to compute a total size. We have seen similar performance with a GFS filesystem over which we regularly rsync entire directory trees. > I have been trying to tweak tunables such as glock_purge and > reclaim_limit but to no avail. All I found that would help me is increasing demote_secs. I believe that causes locks to be held for a longer period of time, so that the initial directory traversal is slow, but subsequent traversals are fast. If however you are running "du" on multiple cluster nodes at the same time, I don't think it'll help at all. > If I could get the same > speedup on the 30G directory as I'm getting on the 2G directory I > would be very happy and so would the users on the cluster. Out of sheer curiosity do your users need to literally run "du" commands routinely, or is that just a simplification of the actual use case? Depending on what your application does, there may be strategies in software that would optimize your performance on GFS. -Jeff On Wed, Aug 11, 2010 at 11:03 AM, Peter Schobel <pschobel@xxxxxxxxxx> wrote: > Hi, > > I am having an issue with a GFS1 cluster in some use cases. Mainly, > running du on a directory takes an unusually long time. I have the > filesystem mounted with noatime and nodiratime statfs_fast is turned > on. Running du on a 2G directory takes about 2 minutes and each > subsequent run took about the same amount of time. Following a tip > that I got, I turned off kernel i/o scheduling (echo noop > > /sys/block/sdc/queue/scheduler) and after I did so, I discovered that > the initial run of du took the same amount of time but subsequent runs > were very fast presumably due to some glock caching benefit (see > results below). > > [testuser@buildmgmt-000 testdir]$ for ((i=0;i<=3;i++)); do time du >>/dev/null; done > > real 2m10.133s > user 0m0.193s > sys 0m14.579s > > real 0m1.948s > user 0m0.043s > sys 0m1.048s > > real 0m0.277s > user 0m0.034s > sys 0m0.240s > > real 0m0.274s > user 0m0.033s > sys 0m0.239s > > This looked very promising but then I discovered that the same speedup > benefit was not realized when traversing our full directory tree. > Following are the results for a 30G directory tree on the same > filesystem. > > [testuser@buildmgmt-000 main0]$ for ((i=0;i<=3;i++)); do time du >>/dev/null; done > > real 5m41.908s > user 0m0.596s > sys 0m36.141s > > real 3m45.757s > user 0m0.574s > sys 0m43.868s > > real 3m17.756s > user 0m0.484s > sys 0m44.666s > > real 3m15.267s > user 0m0.535s > sys 0m45.981s > > I have been trying to tweak tunables such as glock_purge and > reclaim_limit but to no avail. I assume that I am running up against > some kind of cache size limit but I'm not sure how to circumvent it. > There are no other cluster nodes accessing the same test data so there > should not be any lock contention issues. If I could get the same > speedup on the 30G directory as I'm getting on the 2G directory I > would be very happy and so would the users on the cluster. Any help > would be appreciated. > > Regards, > > -- > Peter Schobel > ~ > -- Peter Schobel ~ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster