Dear Linux folks,
When `mdcheck` runs on two 100 TB software RAIDs our users complain
about being unable to open files in a reasonable time.
$ uname -a
Linux handsomejack.molgen.mpg.de 4.19.57.mx64.276 #1 SMP Wed Jul 3 15:15:22 CEST 2019 x86_64 GNU/Linux
$ more /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid6 sdab[0] sdac[15] sdad[14] sdae[13] sdag[12] sdah[11] sdaf[10] sdai[9] sdu[8] sdt[7] sdv[6] sdw[5] sdx[4] sdy[3] sdaa[2] sdz[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
bitmap: 0/59 pages [0KB], 65536KB chunk
md0 : active raid6 sde[0] sds[15] sdr[14] sdp[13] sdq[12] sdo[11] sdn[10] sdl[9] sdm[8] sdk[7] sdj[6] sdh[5] sdi[4] sdg[3] sdf[2] sdd[1]
109394532352 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
bitmap: 2/59 pages [8KB], 65536KB chunk
unused devices: <none>
$ lspci -nn | grep -i RAID
03:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02)
$ sysctl dev.raid.speed_limit_min
dev.raid.speed_limit_min = 1000
$ sysctl dev.raid.speed_limit_max
dev.raid.speed_limit_max = 200000
$ more /etc/cron.d/mdcheck
0 18 * * Fri root /usr/bin/mdcheck --duration "Mon 06:00"
0 18 * * Mon,Tue,Wed,Thu root /usr/bin/mdcheck --continue --duration "Tomorrow 06:00"
$ dmesg | tail -4
[Fri Mar 27 17:58:58 2020] md: data-check of RAID array md1
[Fri Mar 27 17:58:58 2020] md: data-check of RAID array md0
[Sat Mar 28 18:50:20 2020] md: md1: data-check done.
[Sat Mar 28 22:33:33 2020] md: md0: data-check done.
During that time only four threads of the CPU are used.
The article *Software RAID check - slow system issues* [1] recommends to
lower `dev.raid.speed_limit_max`, but the RAID should easily be able to
do 200 MB/s as our tests show over 600 MB/s during some benchmarks.
How do you run `mdcheck` in production without noticeably affecting the
system?
Kind regards,
Paul
[1]:
https://www.alttechnical.com/knowledge-base/linux/126-software-raid-check-slow-system-issues
PS: Details:
$ sudo mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Jul 30 11:44:29 2018
Raid Level : raid6
Array Size : 109394532352 (104326.76 GiB 112020.00 GB)
Used Dev Size : 7813895168 (7451.91 GiB 8001.43 GB)
Raid Devices : 16
Total Devices : 16
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Mar 30 13:51:44 2020
State : active
Active Devices : 16
Working Devices : 16
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : M8015
UUID : 0569ef24:5868e228:ca17105b:ba673204
Events : 446871
Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 48 1 active sync /dev/sdd
2 8 80 2 active sync /dev/sdf
3 8 96 3 active sync /dev/sdg
4 8 128 4 active sync /dev/sdi
5 8 112 5 active sync /dev/sdh
6 8 144 6 active sync /dev/sdj
7 8 160 7 active sync /dev/sdk
8 8 192 8 active sync /dev/sdm
9 8 176 9 active sync /dev/sdl
10 8 208 10 active sync /dev/sdn
11 8 224 11 active sync /dev/sdo
12 65 0 12 active sync /dev/sdq
13 8 240 13 active sync /dev/sdp
14 65 16 14 active sync /dev/sdr
15 65 32 15 active sync /dev/sds
$ sudo mdadm -D /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Wed Mar 6 13:56:48 2019
Raid Level : raid6
Array Size : 109394518016 (104326.74 GiB 112019.99 GB)
Used Dev Size : 7813894144 (7451.91 GiB 8001.43 GB)
Raid Devices : 16
Total Devices : 16
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Mar 30 03:49:21 2020
State : clean
Active Devices : 16
Working Devices : 16
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : M8027
UUID : fdb36dce:6e2dfdaa:853cb1a1:402a9a9a
Events : 48917
Number Major Minor RaidDevice State
0 65 176 0 active sync /dev/sdab
1 65 144 1 active sync /dev/sdz
2 65 160 2 active sync /dev/sdaa
3 65 128 3 active sync /dev/sdy
4 65 112 4 active sync /dev/sdx
5 65 96 5 active sync /dev/sdw
6 65 80 6 active sync /dev/sdv
7 65 48 7 active sync /dev/sdt
8 65 64 8 active sync /dev/sdu
9 66 32 9 active sync /dev/sdai
10 65 240 10 active sync /dev/sdaf
11 66 16 11 active sync /dev/sdah
12 66 0 12 active sync /dev/sdag
13 65 224 13 active sync /dev/sdae
14 65 208 14 active sync /dev/sdad
15 65 192 15 active sync /dev/sdac
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
Stepping: 1
CPU MHz: 1698.649
CPU max MHz: 1700.0000
CPU min MHz: 1200.0000
BogoMIPS: 3396.26
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 3 MiB
L3 cache: 30 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10
NUMA node1 CPU(s): 1,3,5,7,9,11
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm p
be syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmpe
rf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic mo
vbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpc
id_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid r
tm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts