Re: problem with recovered array

Roger Heflin <rogerheflin@xxxxxxxxx> · Mon, 30 Oct 2023 11:14:49 -0500

look at  SAR -d output for all the disks in the raid6.   It may be a
disk issue (though I suspect not given the 100% cpu show in raid).

Clearly something very expensive/deadlockish is happening because of
the raid having to rebuild the data from the missing disk, not sure
what could be wrong with it.

If you are on a kernel that has a newer version upgrading and
rebooting may change something  If it is one close to the newest
kernel version going back a minor version (if 6.5 go back to the last
6.4 kernel).

You might also install the perf package and run "perf top" and see
what sorts of calls the kernel is spending all of its time in.

On Mon, Oct 30, 2023 at 8:44 AM <eyal@xxxxxxxxxxxxxx> wrote:
>
> F38
>
> I know this is a bit long but I wanted to provide as much detail as I thought needed.
>
> I have a 7-member raid6. The other day I needed to send a disk for replacement.
> I have done this before and all looked well. The array is now degraded until I get the new disk.
>
> At one point my system got into trouble and I am not sure why, but it started to have
> very slow response to open/close files, or even keystrokes. At the end I decided to reboot.
> It refused to complete the shutdown and after a while I used the sysrq feature for this.
>
> On the restart it dropped into emergency shell, the array had all members listed as spares.
> I tried to '--run' the array but mdadm refused 'cannot start dirty degraded array'
> though the array was now listed in mdstat and looked as expected.
>
> Since mdadm suggested I use --force', I did so
>         mdadm --assemble --force /dev/md127 /dev/sd{b,c,d,e,f,g}1
>
> Q0) was I correct to use this command?
>
> 2023-10-30T01:08:25+1100 kernel: md/raid:md127: raid level 6 active with 6 out of 7 devices, algorithm 2
> 2023-10-30T01:08:25+1100 kernel: md127: detected capacity change from 0 to 117187522560
> 2023-10-30T01:08:25+1100 kernel: md: requested-resync of RAID array md127
>
> Q1) What does this last line mean?
>
> Now that the array came up I still could not mount the fs (still in emergency shell).
> I rebooted and all came up, the array was there and the fs was mounted and so far
> I did not notice any issues with the fs.
>
> However, it is not perfect. I tried to copy some data from an external (USB) disk to the array
> and it went very slowly (as in 10KB/s, the USB could do 120MB/s). The copy (rsync) was running
> at 100% CPU which is unexpected. I then stopped it. As a test, I rsync'ed the USB disk to another
> SATA disk on the server and it went fast, so the USB disk is OK.
>
> I then noticed (in 'top') that there is a kworker running at 100% CPU:
>
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   944760 root      20   0       0      0      0 R 100.0   0.0 164:00.85 kworker/u16:3+flush-9:127
>
> It did it for many hours and I do not know what it is doing.
>
> Q2) what does this worker do?
>
> I also noticed that mdstat shows a high bitmap usage:
>
> Personalities : [raid6] [raid5] [raid4]
> md127 : active raid6 sde1[4] sdg1[6] sdf1[5] sdd1[7] sdb1[8] sdc1[9]
>        58593761280 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [_UUUUUU]
>        bitmap: 87/88 pages [348KB], 65536KB chunk
>
> Q3) Is this OK? Should the usage go down? It does not change at all.
>
> While looking at everything, I started iostat on md127 and I see that there is a constant
> trickle of writes, about 5KB/s. There is no activity on this fs.
> Also, I see similar activity on all the members, at the same rate, so md127 does not show
> 6 times the members activity. I guess this is just how md works?
>
> Q4) What is this write activity? Is it related to the abovementioned 'requested-resync'?
>         If this is a background thing, how can I monitor it?
>
> Q5) Finally, will the array come up (degraded) if I reboot or will I need to coerse it to start?
>         What is the correct way to bring up a degraded array? What about the 'dirty' part?
> 'mdadm -D /dev/md127' mention'sync':
>      Number   Major   Minor   RaidDevice State
>         -       0        0        0      removed
>         8       8       17        1      active sync   /dev/sdb1
>         9       8       33        2      active sync   /dev/sdc1
>         7       8       49        3      active sync   /dev/sdd1
>         4       8       65        4      active sync   /dev/sde1
>         5       8       81        5      active sync   /dev/sdf1
>         6       8       97        6      active sync   /dev/sdg1
> Is this related?
>
> BTW I plan to run a 'check' at some point.
>
> TIA
>
> --
> Eyal at Home (eyal@xxxxxxxxxxxxxx)