Re: problem with recovered array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/11/2023 22.57, Roger Heflin wrote:
you need the iostat  data from the disks under md127 to see what
mdraid is having to do to internally.   Get the list of sd devices and
do iostat -x  | grep -E 'Device|sdX|sxy|sdz'    (putting into the list
the whole non-partitioned sd devices in your array)

I do have iostat running on the components too, and the writing mostly reflects what md127 sees.
However, while there are no reads on md127, there is often some on the components,
very visible in the last two iostat minutes. Is there some RUW going, on the components?

Here is a snippet, when things were slow:

## meminfo data:
2023-11-02 23:05:07 Dirty:           1037680 kB
2023-11-02 23:05:17 Dirty:           1037288 kB
2023-11-02 23:05:27 Dirty:           1036560 kB
2023-11-02 23:05:37 Dirty:           1036064 kB
2023-11-02 23:05:47 Dirty:           1035720 kB
2023-11-02 23:05:57 Dirty:           1035732 kB
2023-11-02 23:06:07 Dirty:           1035860 kB
2023-11-02 23:06:17 Dirty:           1035708 kB
2023-11-02 23:06:27 Dirty:           1035648 kB
2023-11-02 23:06:37 Dirty:           1035388 kB
2023-11-02 23:06:47 Dirty:           1035336 kB
2023-11-02 23:06:57 Dirty:           1035248 kB
2023-11-02 23:07:07 Dirty:           1037312 kB
2023-11-02 23:07:17 Dirty:           1037172 kB
2023-11-02 23:07:27 Dirty:           1035224 kB
2023-11-02 23:07:37 Dirty:           1035152 kB
2023-11-02 23:07:47 Dirty:           1035064 kB
2023-11-02 23:07:57 Dirty:           1025316 kB
2023-11-02 23:08:07 Dirty:           1020000 kB
2023-11-02 23:08:17 Dirty:           1019628 kB
2023-11-02 23:08:27 Dirty:           1019784 kB
2023-11-02 23:08:37 Dirty:           1019284 kB
2023-11-02 23:08:47 Dirty:           1018908 kB
2023-11-02 23:08:57 Dirty:           1018700 kB

## iostat
23:05:08 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:05:08 md127             2.22         0.00        14.73         0.00          0        884          0
23:05:08 sdb               1.13         1.93        12.10         0.00        116        726          0
23:05:08 sdc               1.32         0.00        16.17         0.00          0        970          0
23:05:08 sdd               1.18         0.00         5.57         0.00          0        334          0
23:05:08 sde               0.83         0.00         1.50         0.00          0         90          0
23:05:08 sdf               0.83         0.00         1.50         0.00          0         90          0
23:05:08 sdg               1.27         4.47         5.97         0.00        268        358          0

23:06:08 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:06:08 md127             2.75         0.00        36.20         0.00          0       2172          0
23:06:08 sdb               1.30         4.47        22.40         0.00        268       1344          0
23:06:08 sdc               1.13         0.00        14.40         0.00          0        864          0
23:06:08 sdd               1.13         0.80        14.20         0.00         48        852          0
23:06:08 sde               0.82         8.53         9.73         0.00        512        584          0
23:06:08 sdf               0.87         8.53         7.13         0.00        512        428          0
23:06:08 sdg               0.75         8.53         1.20         0.00        512         72          0

23:07:08 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:07:08 md127             2.30         0.00        10.93         0.00          0        656          0
23:07:08 sdb               1.17         2.13         5.82         0.00        128        349          0
23:07:08 sdc               1.12         0.00         5.35         0.00          0        321          0
23:07:08 sdd               1.35         4.33         9.68         0.00        260        581          0
23:07:08 sde               1.00         4.33         5.68         0.00        260        341          0
23:07:08 sdf               0.82         0.00         3.95         0.00          0        237          0
23:07:08 sdg               1.00         4.33         5.68         0.00        260        341          0

23:08:08 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:08:08 md127            13.55         0.13       238.73         0.00          8      14324          0
23:08:08 sdb               1.92        36.47        72.25         0.00       2188       4335          0
23:08:08 sdc               1.78        27.67        66.38         0.00       1660       3983          0
23:08:08 sdd               2.25        47.67        95.52         0.00       2860       5731          0
23:08:08 sde               2.03        43.87        77.72         0.00       2632       4663          0
23:08:08 sdf               1.73        49.20        59.65         0.00       2952       3579          0
23:08:08 sdg               1.72        29.80        56.45         0.00       1788       3387          0

23:09:08 Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:09:08 md127             3.22         0.00        59.47         0.00          0       3568          0
23:09:08 sdb               1.88        26.60        36.00         0.00       1596       2160          0
23:09:08 sdc               1.82        32.53        13.60         0.00       1952        816          0
23:09:08 sdd               1.83        17.07        41.80         0.00       1024       2508          0
23:09:08 sde               1.10         9.07        33.40         0.00        544       2004          0
23:09:08 sdf               0.93        21.93         3.80         0.00       1316        228          0
23:09:08 sdg               0.90        15.20         6.73         0.00        912        404          0

The slowness of clearing dirty and what perf top showed (ext4* calls)
indicates it is creating entries for small files and putting data in
those files.   Creating entries for small files requires a lot of both
read and write seeks and seeks are expensive.   At most I would expect
you to be able to rsync/copy around 30-50 files/second (when the array
is healthy and once the size gets small enough the mb/sec does not
matter anymore as the seeks dominate).

My experience is that the array can handle all of this easily, and I never noticed
any hiccups.

On my array I do a nightly copy in of data written during the day to a
SSD (security cams).   And without any array issues that makes my
array really busy.   Anything wrong with the array and that process
takes several times as long.

What I see here is that a few seconds of usual work take a few hours to perform.
This does not look like the array is stressed, something more substantial is going on.

On Thu, Nov 2, 2023 at 6:34 AM <eyal@xxxxxxxxxxxxxx> wrote:

On 02/11/2023 19.34, Johannes Truschnigg wrote:
Hi list,

for the record, I do not think that any of the observations the OP made can be
explained by non-pathological phenomena/patterns of behavior. Something is
very clearly wrong with how this system behaves (the reported figures do not
at all match the expected performance of even a degraded RAID6 array in my
experience) and how data written to the filesystem apparently fails to make it
into the backing devices in acceptable time.

The whole affair reeks either of "subtle kernel bug", or maybe "subtle
hardware failure", I think.

Still, it'd be interesting to know what happens when writes to the array thru
the file system are performed with O_DIRECT in effect, i.e., using `dd
oflag=direct status=progress ...` - does this yield any observable (via
`iostat` et al.) thruput to the disks beneath? What transfer speeds does `dd`
report this way with varying block sizes? Are there no concerning messages in
the debug ringbuffer (`dmesg`) at this time?

I'm not sure how we'd best learn more about what the participating busy kernel
threads (Fedora 38 might have more convenient devices up its sleeve than
ftrace?) are spending their time on in particular, but I think that's what's
needed to be known to pin down the underlying cause of the problem.

To clarify what may be lost on this thread.

I can dd to this fs just fine, The Dirty count does not go above about 4GB. The machine has 32KB.
It is cleared very fast. Tested writing 100GB.

The Dirty blocks rise to the max then drain quickly.

2023-11-02 21:28:27 Dirty:               236 kB
2023-11-02 21:28:37 Dirty:                 8 kB
2023-11-02 21:28:47 Dirty:                40 kB
2023-11-02 21:28:57 Dirty:                68 kB
2023-11-02 21:29:07 Dirty:               216 kB
2023-11-02 21:29:17 Dirty:               364 kB
2023-11-02 21:29:27 Dirty:                20 kB
2023-11-02 21:29:37 Dirty:                48 kB
2023-11-02 21:29:47 Dirty:                24 kB
2023-11-02 21:29:57 Dirty:                36 kB
2023-11-02 21:30:07 Dirty:                28 kB
2023-11-02 21:30:17 Dirty:                52 kB
2023-11-02 21:30:27 Dirty:                36 kB
2023-11-02 21:30:37 Dirty:           4112980 kB
2023-11-02 21:30:47 Dirty:           3772396 kB
2023-11-02 21:30:57 Dirty:           3849688 kB
2023-11-02 21:31:07 Dirty:           3761132 kB
2023-11-02 21:31:17 Dirty:           3846216 kB
2023-11-02 21:31:27 Dirty:           3855060 kB
2023-11-02 21:31:37 Dirty:           3902212 kB
2023-11-02 21:31:47 Dirty:           4173524 kB
2023-11-02 21:31:57 Dirty:           3849856 kB
2023-11-02 21:32:07 Dirty:           3940632 kB
2023-11-02 21:32:17 Dirty:           2212008 kB
2023-11-02 21:32:27 Dirty:               244 kB
2023-11-02 21:32:37 Dirty:               288 kB
2023-11-02 21:32:47 Dirty:                60 kB

rsync'ing a large tree into this same fs is different.
When killed less than 3GB were copied. Mostly small files in many directories.
Something like 92,059 files using 2,519MB.

Note how slowly the dirty blocks are cleared.

2023-11-02 21:36:28 Dirty:               296 kB
2023-11-02 21:36:38 Dirty:            277412 kB
2023-11-02 21:36:48 Dirty:            403928 kB
2023-11-02 21:36:58 Dirty:            606916 kB
2023-11-02 21:37:08 Dirty:            753888 kB
2023-11-02 21:37:18 Dirty:            641624 kB
2023-11-02 21:37:28 Dirty:            744560 kB
2023-11-02 21:37:38 Dirty:            946864 kB
2023-11-02 21:37:48 Dirty:           1355964 kB
2023-11-02 21:37:58 Dirty:           2365632 kB
2023-11-02 21:38:08 Dirty:           2451948 kB
### at this point the rsync was cancelled as I see the kthread start
2023-11-02 21:38:18 Dirty:           2451752 kB
2023-11-02 21:38:28 Dirty:           2440496 kB
2023-11-02 21:38:38 Dirty:           2440308 kB
2023-11-02 21:38:48 Dirty:           2440136 kB
2023-11-02 21:38:58 Dirty:           2440036 kB
2023-11-02 21:39:08 Dirty:           2440240 kB
2023-11-02 21:39:18 Dirty:           2405768 kB
2023-11-02 21:39:28 Dirty:           2405784 kB
2023-11-02 21:39:38 Dirty:           2406012 kB
2023-11-02 21:39:48 Dirty:           2405908 kB
2023-11-02 21:39:58 Dirty:           2405848 kB
2023-11-02 21:40:08 Dirty:           2405876 kB
2023-11-02 21:40:18 Dirty:           2405704 kB
2023-11-02 21:40:28 Dirty:           2405628 kB
2023-11-02 21:40:38 Dirty:           2405544 kB
2023-11-02 21:40:48 Dirty:           2405484 kB
2023-11-02 21:40:58 Dirty:           2405416 kB
2023-11-02 21:41:08 Dirty:           2405412 kB
2023-11-02 21:41:18 Dirty:           2405240 kB
2023-11-02 21:41:28 Dirty:           2405148 kB
2023-11-02 21:41:38 Dirty:           2405132 kB
2023-11-02 21:41:48 Dirty:           2404428 kB
2023-11-02 21:41:58 Dirty:           2405056 kB
2023-11-02 21:42:08 Dirty:           2404944 kB
2023-11-02 21:42:18 Dirty:           2404904 kB
2023-11-02 21:42:28 Dirty:           2404316 kB
2023-11-02 21:42:38 Dirty:           2395340 kB
2023-11-02 21:42:48 Dirty:           2394540 kB
2023-11-02 21:42:58 Dirty:           2394368 kB
2023-11-02 21:43:08 Dirty:           2394520 kB
2023-11-02 21:43:18 Dirty:           2394132 kB
2023-11-02 21:43:28 Dirty:           2394032 kB
2023-11-02 21:43:38 Dirty:           2394276 kB
2023-11-02 21:43:48 Dirty:           2386960 kB
2023-11-02 21:43:58 Dirty:           2387420 kB
2023-11-02 21:44:08 Dirty:           2386620 kB
2023-11-02 21:44:18 Dirty:           2386828 kB
2023-11-02 21:44:28 Dirty:           2386104 kB
2023-11-02 21:44:38 Dirty:           2386328 kB
2023-11-02 21:44:48 Dirty:           2382520 kB
2023-11-02 21:44:58 Dirty:           2382024 kB
2023-11-02 21:45:08 Dirty:           2381344 kB
2023-11-02 21:45:18 Dirty:           2380776 kB
2023-11-02 21:45:28 Dirty:           2380424 kB
2023-11-02 21:45:38 Dirty:           2379672 kB
2023-11-02 21:45:48 Dirty:           2380180 kB
2023-11-02 21:45:58 Dirty:           2373272 kB
2023-11-02 21:46:08 Dirty:           2372416 kB
2023-11-02 21:46:18 Dirty:           2372052 kB
2023-11-02 21:46:28 Dirty:           2372176 kB
... and so on

At the same time I see the kthread in top:

      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1491995 root      20   0       0      0      0 R  97.3   0.0  12:50.67 kworker/u16:2+flush-9:127

And iostat of md127 shows slow writing like this:
           Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
21:59:52 md127             3.50         0.00        30.40         0.00          0        304          0
22:00:02 md127             3.10         0.00        31.20         0.00          0        312          0
22:00:12 md127             2.00         0.00        64.40         0.00          0        644          0
22:00:22 md127             3.40         0.00        22.80         0.00          0        228          0

So, why the slow clearing of the dirty blocks?

I will leave the machine to clear it into the night...

BTW, if I try to shutdown in this situation, it will get stuck, and if I force a sysrq boot I get
an array of all spares on restart. At least this is what happened twice already.

HTH

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)


--
Eyal at Home (eyal@xxxxxxxxxxxxxx)




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux