On Thu, Nov 2, 2023 at 6:31 PM <eyal@xxxxxxxxxxxxxx> wrote: > > On 03/11/2023 04.05, Roger Heflin wrote: > > You need to add the -x for extended stats on iostat. That will catch > > if one of the disks has difficulty recovering bad blocks and is being > > super slow. > > > > And that super slow will come and go based on if you are touching the > > bad blocks. > > I did not know about '-x'. I see that the total columns (kB_read, kB_wrtn) are not included:-( > > Here is one. > > Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util > md127 1.88 116.72 0.00 0.00 11.27 62.19 6.31 1523.93 0.00 0.00 218.42 241.61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.40 1.72 > sdb 0.67 67.42 16.17 96.02 11.61 100.68 3.74 367.79 89.35 95.98 7.65 98.33 0.00 0.00 0.00 0.00 0.00 0.00 2.02 6.25 0.05 1.92 > sdc 0.81 89.74 21.61 96.39 15.30 110.94 3.74 367.58 89.29 95.98 7.70 98.20 0.00 0.00 0.00 0.00 0.00 0.00 2.02 5.15 0.05 1.73 > sdd 0.87 102.17 24.66 96.59 16.75 117.28 3.73 367.34 89.24 95.99 15.00 98.45 0.00 0.00 0.00 0.00 0.00 0.00 2.02 3.28 0.08 3.92 > sde 0.87 101.87 24.58 96.56 19.38 116.46 3.72 367.45 89.28 96.00 16.20 98.71 0.00 0.00 0.00 0.00 0.00 0.00 2.02 3.30 0.08 3.94 > sdf 0.81 90.11 21.70 96.39 16.24 110.80 3.73 367.15 89.20 95.99 14.19 98.51 0.00 0.00 0.00 0.00 0.00 0.00 2.02 3.17 0.07 3.91 > sdg 0.68 67.91 16.28 95.97 12.17 99.30 3.73 367.20 89.21 95.98 13.28 98.32 0.00 0.00 0.00 0.00 0.00 0.00 2.02 3.10 0.06 3.86 > > Interesting to see that sd[bc] have lower w_await,aqu-sz and %util and higher f_await. > Even not yet understanding what these mean, I see that sd[bc] are model ST12000NM001G (recently replaced) while the rest are the original ST12000NM0007 (now 5yo). > I expect this shows different tuning in the device fw. > > I do not expect this to be relevant to the current situation. > > I need to understand the r vs w also. I see wkB/s identical for all members, rkB/s is not. > I expected this to be similar, but maybe md reads different disks at different times to make up for the missing one? > > Still, thanks for your help. I would expect the reads to be slightly different. Note MD is reading 116kb/sec but the underlying disks hare having to do 500kb/sec. MD is doing 1523kb/sec writes and doing 2200kb/sec. So the rads are doing needing to do 4x the real reads to recover/rebuilt the data. The interesting columns are r/s, rkb/s, r_await (how long a read takes in ms) and w/s, rkb/s, w_await (how long an write takes in ms) and the %util. rrqm is read requests and if you divide kb/s -> requests it indicates average io is around 4k. The %util column is the one to watch. If the disk is having internal issues %util will hit close to 100% on lowish reads/writes. If it gets close to 100% that is a really bad sign. You might see what that data looks like when the disk is having issues. You might also start using dirty_bytes and dirty_background_bytes that makes the io suck less when your array gets slow. My array has mythtv stuff and security cam images. During the day I save all of that to a 500GB ssd, and then at midnight move it to the long term spinning disk, and during that window my disks are really busy. And depending on if a rebuild is running and/or if something else is going on with the array how long that takes varies with the amount collected during the day and if there are array issues it takes longer. I have been keeping a spare to use in emergencies.