Re: problem with recovered array

eyal@xxxxxxxxxxxxxx · Fri, 3 Nov 2023 10:23:19 +1100

On 03/11/2023 04.05, Roger Heflin wrote:
You need to add the -x for extended stats on iostat.  That will catch
if one of the disks has difficulty recovering bad blocks and is being
super slow.

And that super slow will come and go based on if you are touching the
bad blocks.

I did not know about '-x'. I see that the total columns (kB_read, kB_wrtn) are not included:-(

Here is one.

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
md127            1.88    116.72     0.00   0.00   11.27    62.19    6.31   1523.93     0.00   0.00  218.42   241.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.40   1.72
sdb              0.67     67.42    16.17  96.02   11.61   100.68    3.74    367.79    89.35  95.98    7.65    98.33    0.00      0.00     0.00   0.00    0.00     0.00    2.02    6.25    0.05   1.92
sdc              0.81     89.74    21.61  96.39   15.30   110.94    3.74    367.58    89.29  95.98    7.70    98.20    0.00      0.00     0.00   0.00    0.00     0.00    2.02    5.15    0.05   1.73
sdd              0.87    102.17    24.66  96.59   16.75   117.28    3.73    367.34    89.24  95.99   15.00    98.45    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.28    0.08   3.92
sde              0.87    101.87    24.58  96.56   19.38   116.46    3.72    367.45    89.28  96.00   16.20    98.71    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.30    0.08   3.94
sdf              0.81     90.11    21.70  96.39   16.24   110.80    3.73    367.15    89.20  95.99   14.19    98.51    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.17    0.07   3.91
sdg              0.68     67.91    16.28  95.97   12.17    99.30    3.73    367.20    89.21  95.98   13.28    98.32    0.00      0.00     0.00   0.00    0.00     0.00    2.02    3.10    0.06   3.86

Interesting to see that sd[bc] have lower w_await,aqu-sz and %util and higher f_await.
Even not yet understanding what these mean, I see that sd[bc] are model ST12000NM001G (recently replaced) while the rest are the original ST12000NM0007 (now 5yo).
I expect this shows different tuning in the device fw.

I do not expect this to be relevant to the current situation.

I need to understand the r vs w also. I see wkB/s identical for all members, rkB/s is not.
I expected this to be similar, but maybe md reads different disks at different times to make up for the missing one?

Still, thanks for your help.

On Thu, Nov 2, 2023 at 8:06 AM <eyal@xxxxxxxxxxxxxx> wrote:

[discussion trimmed]

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)