On 09/11/14 09:58, Dave Johansen wrote:
On Mon, Sep 2, 2013 at 12:40 PM, Ron E <ron@xxxxxxxxxxxxxxx> wrote:
Dear List,
We have noticed a variety of reproducible conditions working with sparse
files on multiple servers under load with CentOS 6.4.
The short story is that processes that read / write sparse files with
large "holes" can generate an IO storm. Oddly, this only happens with holes
and not with the sections of the files that contain data.
We have seen extremely high IO load for example copying a 40 or 80gb
sparse file that only has a few gigs of data in it. Attempts to lower the
io priority and cpu priority of these processes do not make any measurable
difference. (ionice, nice) This has been observed with processes such as:
cp
rsync
sha1sum
The server does have to be under some load to reproduce the necessary
conditions. The cases we have seen involve servers running 10-30 guests
under kvm. Load is in acceptable norms when the processes are run, such as
load avg 5-15 on a 24 core (12 core with HT enabled) server. We also verify
before starting such a process that the spindle with the file we're working
on is not being unduly hammered by another process.
These servers have one hardware raid controller each (Dell H700 controller
with write cache enabled) and multiple raid arrays (separate sets of
physical spindles). Interestingly, the IO storm is not limited to the array
/ spindles where the sparse file resides but affects all IO on that server.
We have looked extensively and not found any account of a similar issue.
We have seen this on configurations that are 'plain vanilla' enough to
think that this is not something specific to our environment.
Wondering if anyone else has seen this and if any suggestions on gathering
more data / troubleshooting. We wonder if we've found either a raid
controller driver issue, an OS issue or some other such thing. What seems
to point in this direction is that even with ionice -c3 which should
prevent the process from using IO unless the storage is idle, an io storm
which appears to saturate the entire raid bus on a given server can occur.
Did you ever figure anything out from this? I've noticed a similar sort of
issue on some of our machines, so I was curious if you found the cause of
the issue or any way to improve the situation.
Thanks,
Dave
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos
Are you sure the HDD is not too busy seeking around (investigate via iotop)?
To confirm you may like to test this on a free disk (not under load,
like an external USB disk).
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos