Hi, Sorry this is a long read. If you want to get to the gist of it, look for "<KEY>" for key points. I'm having some issues with where to find information to troubleshoot mdraid performance issues. The latest "rathole" I'm going down is that I have two identically configured mdraids, 1 per NUMA node on a dual socket AMD Rome with "numas per socket" set to 1 in the BIOS. Things are cranking with a 64K blocksize but I have a substantial disparity between NUMA0's mdraid and NUMA1's. [root@hornet04 block]# uname -r <KEY> 5.15.13-1.el8.elrepo.x86_64 <KEY> [root@hornet04 block]# cat /proc/mdstat (md127 is NUMA 0, md126 is NUMA 1). Personalities : [raid6] [raid5] [raid4] md126 : active raid5 nvme22n1p1[10] nvme20n1p1[7] nvme21n1p1[8] nvme18n1p1[5] nvme19n1p1[6] nvme17n1p1[4] nvme15n1p1[3] nvme14n1p1[2] nvme12n1p1[0] nvme13n1p1[1] 135007147008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU] bitmap: 0/112 pages [0KB], 65536KB chunk md127 : active raid5 nvme9n1p1[10] nvme8n1p1[8] nvme7n1p1[7] nvme6n1p1[6] nvme5n1p1[5] nvme3n1p1[3] nvme4n1p1[4] nvme2n1p1[2] nvme1n1p1[1] nvme0n1p1[0] 135007147008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU] bitmap: 0/112 pages [0KB], 65536KB chunk unused devices: <none> I'm running numa aware identical FIOs, but getting the following in iostat (numa 0 mdraid, outperforms numa 1 mdraid by 12GB/s) [root@hornet04 ~]# iostat -xkz 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.20 0.00 3.35 0.00 0.00 96.45 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme2c2n1 72856.00 0.00 4662784.00 0.00 0.00 0.00 0.00 0.00 0.68 0.00 49.50 64.00 0.00 0.01 100.00 nvme3c3n1 73077.00 0.00 4676928.00 0.00 0.00 0.00 0.00 0.00 0.68 0.00 49.94 64.00 0.00 0.01 100.00 nvme4c4n1 73013.00 0.00 4672896.00 0.00 0.00 0.00 0.00 0.00 0.69 0.00 50.35 64.00 0.00 0.01 100.00 <KEY> nvme18c18n1 54384.00 0.00 3480576.00 0.00 0.00 0.00 0.00 0.00 144.80 0.00 7874.85 64.00 0.00 0.02 100.00 nvme5c5n1 72841.00 0.00 4661824.00 0.00 0.00 0.00 0.00 0.00 0.70 0.00 51.01 64.00 0.00 0.01 100.00 nvme7c7n1 72220.00 0.00 4622080.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 48.61 64.00 0.00 0.01 100.00 nvme22c22n1 54652.00 0.00 3497728.00 0.00 0.00 0.00 0.00 0.00 0.64 0.00 34.73 64.00 0.00 0.02 100.00 nvme12c12n1 54756.00 0.00 3504384.00 0.00 0.00 0.00 0.00 0.00 0.66 0.00 36.34 64.00 0.00 0.02 100.00 nvme14c14n1 54517.00 0.00 3489088.00 0.00 0.00 0.00 0.00 0.00 0.65 0.00 35.66 64.00 0.00 0.02 100.00 nvme6c6n1 72721.00 0.00 4654144.00 0.00 0.00 0.00 0.00 0.00 0.68 0.00 49.77 64.00 0.00 0.01 100.00 nvme21c21n1 54731.00 0.00 3502784.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 36.46 64.00 0.00 0.02 100.00 nvme9c9n1 72661.00 0.00 4650304.00 0.00 0.00 0.00 0.00 0.00 0.71 0.00 51.35 64.00 0.00 0.01 100.00 nvme17c17n1 54462.00 0.00 3485568.00 0.00 0.00 0.00 0.00 0.00 0.66 0.00 36.09 64.00 0.00 0.02 100.00 nvme20c20n1 54463.00 0.00 3485632.00 0.00 0.00 0.00 0.00 0.00 0.66 0.00 36.10 64.00 0.00 0.02 100.10 nvme13c13n1 54910.00 0.00 3514240.00 0.00 0.00 0.00 0.00 0.00 0.61 0.00 33.45 64.00 0.00 0.02 100.00 nvme8c8n1 72622.00 0.00 4647808.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 48.52 64.00 0.00 0.01 100.00 nvme15c15n1 54543.00 0.00 3490752.00 0.00 0.00 0.00 0.00 0.00 0.61 0.00 33.28 64.00 0.00 0.02 100.00 nvme0c0n1 73215.00 0.00 4685760.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 49.41 64.00 0.00 0.01 100.00 nvme19c19n1 55034.00 0.00 3522176.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 36.93 64.00 0.00 0.02 100.10 <KEY> nvme1c1n1 72672.00 0.00 4650944.00 0.00 0.00 0.00 0.00 0.00 106.98 0.00 7774.54 64.00 0.00 0.01 100.00 <KEY> md127 727871.00 0.00 46583744.00 0.00 0.00 0.00 0.00 0.00 11.30 0.00 8221.92 64.00 0.00 0.00 100.00 <KEY> md126 546553.00 0.00 34979392.00 0.00 0.00 0.00 0.00 0.00 14.99 0.00 8194.91 64.00 0.00 0.00 100.10 <KEY> I started chasing the aqu_sz and r_await to see if I have a device issue or if these are known mdraid "features" when I started to try to find the kernel workers and start chasing kernel workers when it became apparent to me that I DON'T KNOW WHAT I'M DOING OR WHAT TO DO NEXT. Any guidance is appreciated. Given 1 drive per NUMA is showing the bad behavior, I'm reluctant to point the finger at hardware. [root@hornet04 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. CPU family: 23 Model: 49 Model name: AMD EPYC 7742 64-Core Processor BIOS Model name: AMD EPYC 7742 64-Core Processor Stepping: 0 CPU MHz: 3243.803 BogoMIPS: 4491.53 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K <KEY> NUMA node0 CPU(s): 0-63,128-191 <KEY> NUMA node1 CPU(s): 64-127,192-255 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca <KEY> When I start doing some basic debugging - not a Linux ninja by far, I see the following, but what is throwing me is seeing (at least these workers that I suspect have to do with md, all running on NUMA node 1. This is catching me by surprise. Are there other workers that I'm missing????? ps -eo pid,tid,class,rtprio,ni,pri,numa,psr,pcpu,stat,wchan:14,comm | head -1; ps -eo pid,tid,class,rtprio,ni,pri,numa,psr,pcpu,stat,wchan:14,comm | egrep 'md|raid' | grep -v systemd | grep -v mlx PID TID CLS RTPRIO NI PRI NUMA PSR %CPU STAT WCHAN COMMAND 1522 1522 TS - 5 14 1 208 0.0 SN - ksmd 1590 1590 TS - -20 39 1 220 0.0 I< - md 3688 3688 TS - -20 39 1 198 0.0 I< - raid5wq 3693 3693 TS - 0 19 1 234 0.0 S - md126_raid5 3694 3694 TS - 0 19 1 95 0.0 S - md127_raid5 3788 3788 TS - 0 19 1 240 0.0 Ss - lsmdcat / Jim Finlayson U.S. Department of Defense