Re: htree stabilitity and performance issues

kwijibo@xxxxxxxxxx · Thu, 18 Dec 2003 08:26:28 -0700

My guess it is it being the partly NFS Linux implementation or the way

it talks with the filesystem and the filesystem itself.  I have seen 
this problem without

the htree patch and just normal ext3.  I was considering using htree to 
help with the

problem but after this post I think I will hold off.

I think I have a somewhat similiar setup to Adams with Maildirs and all.

I have a setup that goes through a Linux LVS, then to 1 of 6

FreeBSD NFS clients that run qmail, then over NFS to a Linux storage

box.  POP and webmail goes through the same process on the same boxes.

Now as near as we can figure the stat and listing of POP and webmail was

killing the storage box.  I have 32 instances of NFS on the storage

box and the load would go up to about 32.  1 load for every NFS

proc, I have read about this somewhere before.  All the NFS daemons

would go into the DW state.  From what it looked like NFS was waiting

on the filesystem to stat and list peoples Maildirs that had lots of files.

Another strange note is that kswapd would be using 99 percent CPU during

these NFS storms, not sure why, since I wasn't swapping.  All of this was

happening while I had plenty of disk IO left on the storage box. 

Eventually we just started to nuke old mail out of the larger dirs to get

them down to a sane size and things have cleared up. 

Now from what it sounds like htree will actually make things worse in
this type of situation, is this correct?  Is there a patch somewhere
or a filesystem out there that is good at doing this stat and list
type of load.  Or is it just NetApp time? :)

Steve

Theodore Ts'o wrote:

On Thu, Dec 18, 2003 at 01:36:25PM +1100, Adam Cassar wrote:

What's your take on the nfs client load issues? It does run for 4-5

hours albeit at higher load (how explained by your post) however it does

eventually die with the load going stupid (180 odd). It seems that the

patch still has some nfs interoperability problems.

Was this on the nfs *client* or the nfs *server*?  

I'd really, really like to see a ps listing on the machine involved;
the output of "ps alxww" and "ps auxww" would be useful.  The question
is what processes are hung in wait, and what they're waiting on....

It would also be interesting to see if the LD_PRELOAD hack which I
sent you helped alleviate the load on the server?  With the LD_PRELOAD
hack, the access pattern on stat's and open's should be restored to
the original workload, so if that makes the problem go away, then
the problem was merely that NFS doesn't degrade gracefully under load.

(This is not actually earth-shattering news; I've had really strange
results trying to do heavy-duty NFS over a wireless connection,
although that's more due to Linux's NFS implementation utterly failing
to deal dropped packets.)

I believe, although I am not sure, that there are some NFS
improvements that went into 2.6 that didn't get back-ported to 2.4.
So it might be that running 2.6.0 on the clients and/or servers might
actually help.  That would be a pretty daring move, though....

Finally, can you give me a little bit more detail of exactly what is
running on the clients and server, and the rationale of why you are
trying to apparently run incoming mail processes over NFS?  (Is that
what you're doing?  If so, it sounds rather scary...)

					- Ted

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users