Lots of small files can certainly expose some of the performance limitations of gfs. "Hours" sounds very odd, though, so I ran a couple sanity tests on my own test hardware.
One node mounted with lock_dlm, the directory has 100,000 4k files, running "time ls -l | wc -l".
- dual P3 700 MHz, 256 MB, some old FC disks in a JBOD 5 min 30 sec
- P4 2.4 GHz, 512 MB, iscsi to a netapp 2 min 30 sec
Having more nodes mounted didn't change this. (Four nodes of the first kind all running this at the same time averaged about 17 minutes each.)
Here are some more data points from the latest test I have tried. I was feeling emboldened by the speed of the writing I was seeing, so I tried loading up a few more files.
The setup: 2 nodes, lock_dlm. Both are P3/866s with 1 GB of RAM apiece. One of the nodes (hudson) has two CPUs.
The process: 1) Mount disks. 2) Copy image files into subdirectory. Other node is idle. 3) hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l' 298407
real 7m40.726s user 0m5.541s sys 1m58.229s
While that's not stunningly great I consider it acceptable performance. It's a lot of dentries to crawl through.
4) Feeling frisky, I decided to do a "real" test by unmounting and remounting the FS in order to clear caches, etc. The other host did not get touched in this interval.
5) hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l' 298407
real 74m43.284s user 0m5.533s sys 0m40.146s
Order of magnitude slower.
6) OK, I think, that might be the time to acquire locks the first time. Doing it again should be fast:
hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l' 298407
real 75m29.150s user 0m5.528s sys 0m40.724s
7) Ugh. OK, let's try it on the other node:
greenville:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l' 298407
real 77m38.569s user 0m8.850s sys 0m35.006s
Both systems are sitting there idle now. What did I do by unmounting and remounting the GFS partition?
For the record that is just over 12GB of data in those 298407 files. Partition is 3% full (as reported by df).
Help...?
-m