it talks with the filesystem and the filesystem itself. I have seen this problem without
the htree patch and just normal ext3. I was considering using htree to help with the
problem but after this post I think I will hold off.
I think I have a somewhat similiar setup to Adams with Maildirs and all.
I have a setup that goes through a Linux LVS, then to 1 of 6
FreeBSD NFS clients that run qmail, then over NFS to a Linux storage
box. POP and webmail goes through the same process on the same boxes.
Now as near as we can figure the stat and listing of POP and webmail was
killing the storage box. I have 32 instances of NFS on the storage
box and the load would go up to about 32. 1 load for every NFS
proc, I have read about this somewhere before. All the NFS daemons
would go into the DW state. From what it looked like NFS was waiting
on the filesystem to stat and list peoples Maildirs that had lots of files.
Another strange note is that kswapd would be using 99 percent CPU during
these NFS storms, not sure why, since I wasn't swapping. All of this was
happening while I had plenty of disk IO left on the storage box.
Eventually we just started to nuke old mail out of the larger dirs to get
them down to a sane size and things have cleared up.
Now from what it sounds like htree will actually make things worse in this type of situation, is this correct? Is there a patch somewhere or a filesystem out there that is good at doing this stat and list type of load. Or is it just NetApp time? :)
Steve
Theodore Ts'o wrote:
On Thu, Dec 18, 2003 at 01:36:25PM +1100, Adam Cassar wrote:
What's your take on the nfs client load issues? It does run for 4-5
hours albeit at higher load (how explained by your post) however it does
eventually die with the load going stupid (180 odd). It seems that the
patch still has some nfs interoperability problems.
Was this on the nfs *client* or the nfs *server*?
I'd really, really like to see a ps listing on the machine involved; the output of "ps alxww" and "ps auxww" would be useful. The question is what processes are hung in wait, and what they're waiting on....
It would also be interesting to see if the LD_PRELOAD hack which I sent you helped alleviate the load on the server? With the LD_PRELOAD hack, the access pattern on stat's and open's should be restored to the original workload, so if that makes the problem go away, then the problem was merely that NFS doesn't degrade gracefully under load.
(This is not actually earth-shattering news; I've had really strange results trying to do heavy-duty NFS over a wireless connection, although that's more due to Linux's NFS implementation utterly failing to deal dropped packets.)
I believe, although I am not sure, that there are some NFS improvements that went into 2.6 that didn't get back-ported to 2.4. So it might be that running 2.6.0 on the clients and/or servers might actually help. That would be a pretty daring move, though....
Finally, can you give me a little bit more detail of exactly what is running on the clients and server, and the rationale of why you are trying to apparently run incoming mail processes over NFS? (Is that what you're doing? If so, it sounds rather scary...)
- Ted
_______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users
_______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users