Re: Raid6 check performance regression 5.15 -> 5.16

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 8, 2022 at 2:51 PM Larkin Lowrey <llowrey@xxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 8, 2022 at 3:50 PM Song Liu <song@xxxxxxxxxx> wrote:
> > On Mon, Mar 7, 2022 at 10:21 AM Larkin Lowrey <llowrey@xxxxxxxxxxxxxxxxx> wrote:
> >> I am seeing a 'check' speed regression between kernels 5.15 and 5.16.
> >> One host with a 20 drive array went from 170MB/s to 11MB/s. Another host
> >> with a 15 drive array went from 180MB/s to 43MB/s. In both cases the
> >> arrays are almost completely idle. I can flip between the two kernels
> >> with no other changes and observe the performance changes.
> >>
> >> Is this a known issue?
> >
> > I am not aware of this issue. Could you please share
> >
> >    mdadm --detail /dev/mdXXXX
> >
> > output of the array?
> >
> > Thanks,
> > Song
>
> Host A:
> # mdadm --detail /dev/md1
> /dev/md1:
>             Version : 1.2
>       Creation Time : Thu Nov 19 18:21:44 2020
>          Raid Level : raid6
>          Array Size : 126961942016 (118.24 TiB 130.01 TB)
>       Used Dev Size : 9766303232 (9.10 TiB 10.00 TB)
>        Raid Devices : 15
>       Total Devices : 15
>         Persistence : Superblock is persistent
>
>       Intent Bitmap : Internal
>
>         Update Time : Tue Mar  8 12:39:14 2022
>               State : clean
>      Active Devices : 15
>     Working Devices : 15
>      Failed Devices : 0
>       Spare Devices : 0
>
>              Layout : left-symmetric
>          Chunk Size : 512K
>
> Consistency Policy : bitmap
>
>                Name : fubar:1  (local to host fubar)
>                UUID : eaefc9b7:74af4850:69556e2e:bc05d666
>              Events : 85950
>
>      Number   Major   Minor   RaidDevice State
>         0       8        1        0      active sync   /dev/sda1
>         1       8       17        1      active sync   /dev/sdb1
>         2       8       33        2      active sync   /dev/sdc1
>         3       8       49        3      active sync   /dev/sdd1
>         4       8       65        4      active sync   /dev/sde1
>         5       8       81        5      active sync   /dev/sdf1
>        16       8       97        6      active sync   /dev/sdg1
>         7       8      113        7      active sync   /dev/sdh1
>         8       8      129        8      active sync   /dev/sdi1
>         9       8      145        9      active sync   /dev/sdj1
>        10       8      161       10      active sync   /dev/sdk1
>        11       8      177       11      active sync   /dev/sdl1
>        12       8      193       12      active sync   /dev/sdm1
>        13       8      209       13      active sync   /dev/sdn1
>        14       8      225       14      active sync   /dev/sdo1
>
> Host B:
> # mdadm --detail /dev/md1
> /dev/md1:
>             Version : 1.2
>       Creation Time : Thu Oct 10 14:18:16 2019
>          Raid Level : raid6
>          Array Size : 140650080768 (130.99 TiB 144.03 TB)
>       Used Dev Size : 7813893376 (7.28 TiB 8.00 TB)
>        Raid Devices : 20
>       Total Devices : 20
>         Persistence : Superblock is persistent
>
>       Intent Bitmap : Internal
>
>         Update Time : Tue Mar  8 17:40:48 2022
>               State : clean
>      Active Devices : 20
>     Working Devices : 20
>      Failed Devices : 0
>       Spare Devices : 0
>
>              Layout : left-symmetric
>          Chunk Size : 128K
>
> Consistency Policy : bitmap
>
>                Name : mcp:1
>                UUID : 803f5eb5:e59d4091:5b91fa17:64801e54
>              Events : 302158
>
>      Number   Major   Minor   RaidDevice State
>         0       8        1        0      active sync   /dev/sda1
>         1      65      145        1      active sync   /dev/sdz1
>         2      65      177        2      active sync   /dev/sdab1
>         3      65      209        3      active sync   /dev/sdad1
>         4       8      209        4      active sync   /dev/sdn1
>         5      65      129        5      active sync   /dev/sdy1
>         6       8      241        6      active sync   /dev/sdp1
>         7      65      241        7      active sync   /dev/sdaf1
>         8       8      161        8      active sync   /dev/sdk1
>         9       8      113        9      active sync   /dev/sdh1
>        10       8      129       10      active sync   /dev/sdi1
>        11      66       33       11      active sync   /dev/sdai1
>        12      65        1       12      active sync   /dev/sdq1
>        13       8       65       13      active sync   /dev/sde1
>        14      66       17       14      active sync   /dev/sdah1
>        15       8       49       15      active sync   /dev/sdd1
>        19      66       81       16      active sync   /dev/sdal1
>        16      66       65       17      active sync   /dev/sdak1
>        17       8      145       18      active sync   /dev/sdj1
>        18      66      129       19      active sync   /dev/sdao1
>
> The regression was introduced somewhere between these two Fedora kernels:
> 5.15.18-200 (good)
> 5.16.5-200 (bad)

Hi folks,

Sorry for the regression and thanks for sharing your array setup and
observations.

I think I have found the fix for it. I will send a patch for it. If
you want to try the fix
sooner, you can find it at:

For 5.16:
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=tmp/fix-5.16&id=872c1a638b9751061b11b64a240892c989d1c618

For 5.17:
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=tmp/fix-5.17&id=c06ccb305e697d89fe99376c9036d1a2ece44c77

Thanks,
Song



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux