On 05/31/2017 10:18 AM, Roman Mamedov wrote:
On Wed, 31 May 2017 10:07:50 -0400
Joe Landman <joe.landman@xxxxxxxxx> wrote:
procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
3 0 0 130058176 2412 500660 0 0 0 0 3 17 0
2 98 0 0
1 0 0 130057352 2412 501012 0 0 0 0 28827 69339
0 3 97 0 0
3rd from right is % idle.
This is 95-98% idle. Is the rebuild done?
It's a 40-core CPU with one core completely maxed out into 100% use with some
non-multithreaded load from md. Yes, 100% use of one core on a 40-core CPU
will show up as ~97% idle overall. Take a closer look at all the data
presented.
Hmmm... Methinks thou dost protesteth to much.
The system is effectively idle apart from 1 CPU. 20 physical, 40 with
SMT. 1 fully loaded CPU in either context is between 2.5 and 5%
loaded. In no scenario that I've seen, would I (or anyone else) call
this "loaded".
Moreover ... and this is the important part ... the interrupt rate and
CSW rate was low. Which means that the CPU is not struggling with
overhead of handling the "calculations" which for RAID10 are ... well
... trivial (effectively buffer copies).
This means a single CPU was "loaded", but in the context of bio
submissions that were queued and being waited for. Not because of
calculations. That is, if you understand how linux actually calculates
load, you understand that IOs queued play a (significant) factor. You
would see queued reads in the vmstat line, as well as blocked reads.
This is one of the reasons I asked for this output ... vmstat is
suprisingly simple, and incredibly informative. You can get similar
information from dstat, or glances -t 1 if you have that installed.
So, the information we have is
1) interrupts are not wildly inappropriate
2) context switches are also reasonable
3) CPU (a single core) is not doing much calculation
Whats left.
1) driver
2) hardware (HBA and/or expander)
3) disk configuration (WCE,RCD)
4) ncq
5) read-ahead (what does 'blockdev --getra /dev/sd*' report?)
In a Holmesian manner, we simple remove the impossible (based upon our
observation), and what remains, no matter how improbably, is likely a
factor.
The system has very low actual computational load, interrupt and context
switch load. So ... its not ... loaded. Then what comes next? The
list I gave.
Feel free to suggest other things.
--
Joe Landman
e: joe.landman@xxxxxxxxx
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html