For one node, it should be pretty fast, even for large directories. Also, you should make sure that you're not being bitten by ls. For many people, by default, ls is also stat-ing every entry in the directory. (for colors or the extra char at the end of the name) As well as ls typically reads all of teh entries, sorts them, then formats and displays.
I do know about the issues with ls in general. In loading up the GFS partition (as I referenced earlier) I noticed some interesting behavior. I have a script that is copying one file at a time from the source drive running on one of the cluster hosts, and I'm doing a 'strace ls' on the other. The getdents64 syscalls are taking an average of about 1/3 of a second to return, which isn't that bad I suppose given the contention. What's interesting about it is that the copying and the getdents64 seem to finish at the same time, such that the two windows scroll in more-or-less lockstep. It's hard to quantify, but it seems like the nodes are spending a lot of time wrangling over the directory lock. The two machines are clustered over their own network segment on their secondary interfaces (which is, incidentally, made difficult by cman's insistence on believing cluster.conf instead of the command-line). At least, they are supposed to be. There's still a lot of network traffic on the primary interfaces. Is there a way to ensure that the cluster chatter stays on one interface or the other?
Immediately after loading the directory I did this:
hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l' 298407
real 7m40.726s user 0m5.541s sys 1m58.229s
In any event, I unmounted and remounted the GFS partition to clear state, then started another time sh -c 'ls 100032/mls/fmls_stills | wc -l' on that node. There is still another (idle) node with the disks mounted.
It's been going for about an hour now. If I strace the ls I can see it moving (getdents64 are returning). top shows dlm_sendd taking up 95% of the cpu and load average is over 8. Based on ls's memory usage from the previous run above, on these same files, I think it is about 1/3 of the way done.
Despite my attempts to control the cluster interface usage it looks like they are chattering over both interfaces (sar showing a steady 8-10kb per second for each).
I must have something screwed up here. GFS gurus, please enlighten me. Is it time to update the gfs code? I am using a fairly old version (9/19 or thereabouts).
-m