Re: [Linux-cluster] clvmd without GFS?

Matt Mitchell <mmitchell@xxxxxxxxxxxxxxxxxxxxx> · Thu, 28 Oct 2004 10:35:43 -0500

Michael Conrad Tadpol Tilstra wrote:
For one node, it should be pretty fast, even for large directories.
Also, you should make sure that you're not being bitten by ls.  For many
people, by default, ls is also stat-ing every entry in the directory.
(for colors or the extra char at the end of the name)  As well as ls
typically reads all of teh entries, sorts them, then formats and
displays.

I do know about the issues with ls in general.  In loading up the GFS 
partition (as I referenced earlier) I noticed some interesting behavior. 
 I have a script that is copying one file at a time from the source 
drive running on one of the cluster hosts, and I'm doing a 'strace ls' 
on the other.  The getdents64 syscalls are taking an average of about 
1/3 of a second to return, which isn't that bad I suppose given the 
contention.  What's interesting about it is that the copying and the 
getdents64 seem to finish at the same time, such that the two windows 
scroll in more-or-less lockstep.  It's hard to quantify, but it seems 
like the nodes are spending a lot of time wrangling over the directory 
lock.  The two machines are clustered over their own network segment on 
their secondary interfaces (which is, incidentally, made difficult by 
cman's insistence on believing cluster.conf instead of the 
command-line).  At least, they are supposed to be.  There's still a lot 
of network traffic on the primary interfaces.  Is there a way to ensure 
that the cluster chatter stays on one interface or the other?

Immediately after loading the directory I did this:

hudson:/mnt/xs_media# time sh -c 'ls 100032/mls/fmls_stills | wc -l'
298407

real    7m40.726s
user    0m5.541s
sys     1m58.229s

In any event, I unmounted and remounted the GFS partition to clear 
state, then started another time sh -c 'ls 100032/mls/fmls_stills | wc 
-l' on that node.  There is still another (idle) node with the disks 
mounted.

It's been going for about an hour now.  If I strace the ls I can see it 
moving (getdents64 are returning).  top shows dlm_sendd taking up 95% of 
the cpu and load average is over 8.  Based on ls's memory usage from the 
previous run above, on these same files, I think it is about 1/3 of the 
way done.

Despite my attempts to control the cluster interface usage it looks like 
they are chattering over both interfaces (sar showing a steady 8-10kb 
per second for each).

I must have something screwed up here.  GFS gurus, please enlighten me. 
 Is it time to update the gfs code?  I am using a fairly old version 
(9/19 or thereabouts).

-m