After running several days with the larger table sizes I don't think
it's made any difference to individual thread performance or overall
throughput.
Likewise, the following changes have had no effect on access time for
large directories (but they have improved caching and improved high load
overall performance):
Increasing dentry and inode caches to the maximum size allowed by the
kernel (about 128million entries. This is limited as a percentage of
memory to about 10%)
This helped caching under load, but until I added the following change:
(sysctl)
vm.max_reclaims_in_progress=1
vm.zone_reclaim_mode=0
The cached dentry data would evaporate after a while.
(Switching to reeclaim_mode=0 is recommended for fileservers to enhance
dentry/inode caching)
At the end of all that, the effect is only minor and the biggest bugbear
- access to directories with more than ~150 files onboard is unusably
slow - hasn't been addressed.
The change which had the largest effect on this problem - switching to
lock_nolock - isn't practical in a production cluster environment (and
defeats the purpose of using GFS2 anyway)
Iostat's showing that under heavy i/o load (1000-3000 requests/second
but only 2-3Mb/s actual data), the kernel on one machine can sit on
read/write equests for up to 3000ms before passing them to the storage
devices - which usually respond within 2-5ms. It's sitting at 300ms most
of the time and the machine concerned only has 5 FSes mounted.
The other 2 machines in the cluster not facing this kind of
treatement(100-300 requests/second) have 30 mounts each, can easily read
at 10-20Mb/s and have read delays of 2-10ms (mostly 3-4).
Users report that these 2 machines are _fast_ when not accessing
directories with large numbers of files onboard...
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster