Re: I/O wait problem with hardware raid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric S. Johansson wrote:
Bill Davidsen wrote:

iowait means that there is a program waiting for I/O. That's all.

I was under the impression that I/O wait created a blocking condition.

Of
course when you do a copy (regardless of software) the CPU is waiting
for disk transfers. I'm not sure what you think you should debug, i/o
takes time, and if the program is blocked until the next input comes in
it will enter the waitio state. If there is no other process to use the
available CPU it becomes waitio, which is essentially available CPU
cycles similar to idle.

What exactly do you think is wrong?

As I run rsync which increases the I/O wait state, the first thing I notice is
that IMAP starts getting slow, users start experiencing failures in sending
e-mail, and the initial exchange for ssh gets significantly longer.

All of these problems have both networking and file I/O in common and I'm trying
to separate out where the problem is coming from.  I have run netcat which has
shown that the network throughput is not wonderful but that's a different
problem for me to solve.  When I run netcat, there is no degradation of ssh,
IMAP or SMTP response times. the problem shows up if I run CP or rsync internal
source and target.   the problem becomes the worst when I'm doing rsync within
the local filesystem and another rsync to an external rsync server.  At that
point, the system becomes very close to unusable.

Of course, I can throttle back rsync and regain some usability but I'm backing
up a couple terabytes of information and it's a time-consuming process even with
rsync and would like  it to run as quickly as possible. I should probably point
out that the disk array is a relatively small raid five set up with six 1 TB
drives.  Never did like raid five especially when it's on a bit of firmware.
Can't wait for ZFS (or its equivalent) on linux to reach production quality.

from where I stand right now, this might be "it sucks but it's perfectly
normal".  In a situation with heavy disk I/O, I would expect anything that
accesses the disc to run slowly and in a more naïve moment, I thought that the
GUI wouldn't be hurt by heavy disk I/O and then I remembered that gnome and its
kindred have lots of configuration files to read every time you move the mouse.  :-)

Any case, the people that sign my check aren't happy because they spent all his
money on an HP server and performs no better than an ordinary PC.  I'm hoping I
can learn enough to give them a cogent explanation if I can't give them a solution.


I appreciate the help.



IO wait really means.

You are asking the IO subsystem to do more than it can, and one can do this *NO* matter how fast ones disks are, in this case throttle the IO processes somewhat to allow other things to still function.

The only real way out of this is to either throttle things, or get faster disks and/or raid subsystem.

I would suggest that you (with the machine otherwise unused) test sequential reads and writes off the raid unit and post the speeds of those test, you will need to make sure things break the cache to get real numbers, and also post the number and type of disks that you have, that will give some idea of what the IO system can do/should be able to do.

If the parts are picked incorrectly things will suck, there are several different general classes of raid controller and some of them run alot faster than the others (and cost more). There are at least 3 classes of disks (SATA, SAS-10k, SAS-15k) and the more expensive ones run much faster.

Even the best of disks/raid controllers will run at full speed on any of the dual socket machines (from anyone) that has enough PCI-X and/or PCI-e busses to keep the contollers supplied with data.

How much one can push is related to how many PCI-X and/or PCI-e busses and how much is shared between the different busses. On most desktop boards *ALL* of the pci busses are shared (there is only 132MB/second between all PCI slots), this got better with PCI-e but usually on desktop boards there aren't very many of them. On the higher end boards (usually 2+ socket, but some single socket enterprise boards that have PCI-X) often there are several different PCI-X busses that are not shared, and won't interfere with each other. And on some higher end AMD boards there are slots/chipsets connected to *BOTH* cpus, so that increases possible bandwidth if done correctly.

                                Roger
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux