Re: Raid6 check performance regression 5.15 -> 5.16

Wilson Jonathan <i400sjon@xxxxxxxxx> · Tue, 08 Mar 2022 09:41:17 +0000

On Mon, 2022-03-07 at 13:15 -0500, Larkin Lowrey wrote:
> I am seeing a 'check' speed regression between kernels 5.15 and 5.16.
> One host with a 20 drive array went from 170MB/s to 11MB/s. Another
> host 
> with a 15 drive array went from 180MB/s to 43MB/s. In both cases the 
> arrays are almost completely idle. I can flip between the two kernels
> with no other changes and observe the performance changes.

I am also seeing a huge slowdown on Debian using 5.16.0-3-amd64.
Normally my monthly scrub would take from 1am till about 10am.

This was a consistent timing which its been doing for close to two
years without fail. The check speed would start in the 130MB-ish range
and eventually slow to about 90MB-ish the closer to finishing it got.
The disks are WD RED's (the non-dodgy ones) WDC WD40EFRX-68N32N0 and
there are 6 of them in raid6 (no spares). There are no abnormal
smartctl figures (such as RRER, MZER, etc.) showing so its not one
starting to fail.

The current speed is now down to 54,851K with at least 4 hours to go
and has been running from 8PM to 9AM already (I kicked it off manually
last night as I could see it was going to take forever at the weekend
and granddaughter doesn't deal with "its going slow" very well so I
killed it).

The problem is not limited to hard drives. I also run 3
arrays/partitions on NVME (set up as 3 drives, one spare, raid10-far2
which are used for /, /var, *swap) which instead of taking about 2 mins
are taking in excess of 10 mins to complete.

Before running the current mdadm check(s) the kernel was upgraded. I
try to apt-get update, apt-get dist-upgrade at the weekend but some
times forget so I can't tell if a check was run under the previous
version or a version prior to that... The previous version was 5.16.0-
3-amd64 which as far as I can tell had no issues (I tend access my
computer around 9 on a Sunday and get hit once a month by programs
"hanging"/being slow which reminds me to check if a mdadm check is
running, cat /proc/mdstat, which it usually is and it usually tells me
that I should be fine by 10-ish (I do the mins/60).

In the time its taken me to type this, and run commands to check
figures etc, and then check it and amend things (about 30-40 mins) the
speed is now down to 52,187K. I'm going to let it finish as I don't
like the idea of not having the monthly scrub complete, but boy does it
suck when I can see it getting much slower than usual the closer it
gets to finishing.

> 
> Is this a known issue?

Well you and me makes two noticing an issue so...

> 
> --Larkin

Jon.