Re: RAID6 - CPU At 100% Usage After Reassembly

Francisco Parada <advanceandconquer@xxxxxxxxx> · Fri, 30 Sep 2016 09:06:39 -0400

Hello Shaohua and all,

Was anyone able to take a look at the trace I provided?  Curious if I
should just give up hope and start from scratch.  Any feedback is
appreciated.

Thank you,
Cisco

On Mon, Sep 26, 2016 at 10:29 AM, Francisco Parada
<advanceandconquer@xxxxxxxxx> wrote:
> Hi all,
>
> It doesn't seem like my response from last night, made it to the list:
>
>
> Hi Shaohua and all,
>
> I was finally able to upgrade my Ubuntu server to a newer version of
> the kernel and mdadm:
> ==========================
> $ uname -r; mdadm -V
>
> 4.8.0-rc7-custom
>
> mdadm - v3.4 - 28th January 2016
> ==========================
>
>
> I rebuilt the kernel with the options that Shaohua asked me to build it with:
> ======================================
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5883ef0..db484ca 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -62,6 +62,9 @@
>  #include "raid0.h"
>  #include "bitmap.h"
>
> +#undef pr_debug
> +#define pr_debug trace_printk
> +
>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>  #define ANY_GROUP NUMA_NO_NODE
> ======================================
>
>
> Here's how things look so far, nothing different yet:
> ======================================
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : inactive sdd[10](S) sdk[0](S) sdj[2](S) sdh[3](S) sde[11](S)
> sdg[9](S) sdf[7](S) sdb[13](S) sdc[12](S)
>       26371219608 blocks super 1.2
>
> unused devices: <none>
> ======================================
>
>
> Here's an event snapshot of my array, just keep in mind that
> "/dev/sdi" is my failed drive, so I omitted it from the examination:
> ======================================
> # mdadm -E /dev/sd[b-h,j,k] |grep Events
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280011
>          Events : 280033
>          Events : 280033
> ======================================
>
>
> It's important to note, that since I haven't done anything yet, my CPU is idle:
> ======================================
> top - 20:22:00 up  5:56,  2 users,  load average: 0.04, 0.03, 0.00
>
> Tasks: 221 total,   1 running, 220 sleeping,   0 stopped,   0 zombie
>
> %Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 97.5 id,  0.5 wa,  0.0 hi,  0.0 si,  0.0 st
>
> KiB Mem :  1525400 total,   103836 free,   696208 used,   725356 buff/cache
>
> KiB Swap: 25153532 total, 25117380 free,    36152 used.   454808 avail Mem
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>  2093 cisco     20   0 1761112 153108  55640 S   1.0 10.0   0:12.61 gnome-shell
>  4322 root      20   0   40520   3684   3100 R   1.0  0.2   0:00.22 top
>     1 root      20   0  119692   5540   3992 S   0.0  0.4   0:02.44 systemd
>     2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
>     3 root      20   0       0      0      0 S   0.0  0.0   0:00.09 ksoftirqd/0
>     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:
> ======================================
>
>
> Now onto the fun part.  I stopped "/dev/md127":
> ======================================
> # mdadm --stop /dev/md127
> mdadm: stopped /dev/md127
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
> ======================================
>
>
> For completion, here's the trace output after stopping the array, and
> before reassembling:
> ======================================
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0   #P:2
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
> /sys/kernel/debug/tracing/trace (END)
> ======================================
>
>
> Then I reassembled the array:
> ======================================
> # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdj /dev/sdk
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb is busy - skipping
> mdadm: Merging with already-assembled /dev/md/en1
> mdadm: /dev/sdb is identified as a member of /dev/md/en1, slot 7.
> mdadm: /dev/sdc is identified as a member of /dev/md/en1, slot 8.
> mdadm: /dev/sdd is identified as a member of /dev/md/en1, slot 6.
> mdadm: /dev/sde is identified as a member of /dev/md/en1, slot 9.
> mdadm: /dev/sdf is identified as a member of /dev/md/en1, slot 4.
> mdadm: /dev/sdg is identified as a member of /dev/md/en1, slot 1.
> mdadm: /dev/sdh is identified as a member of /dev/md/en1, slot 3.
> mdadm: /dev/sdj is identified as a member of /dev/md/en1, slot 2.
> mdadm: /dev/sdk is identified as a member of /dev/md/en1, slot 0.
> mdadm: Marking array /dev/md/en1 as 'clean'
> mdadm: /dev/md/en1 has an active reshape - checking if critical
> section needs to be restored
> mdadm: No backup metadata on device-7
> mdadm: No backup metadata on device-8
> mdadm: No backup metadata on device-9
> mdadm: added /dev/sdg to /dev/md/en1 as 1
> mdadm: added /dev/sdj to /dev/md/en1 as 2
> mdadm: added /dev/sdh to /dev/md/en1 as 3 (possibly out of date)
> mdadm: added /dev/sdf to /dev/md/en1 as 4
> mdadm: no uptodate device for slot 5 of /dev/md/en1
> mdadm: added /dev/sdd to /dev/md/en1 as 6
> mdadm: /dev/sdb is already in /dev/md/en1 as 7
> mdadm: added /dev/sdc to /dev/md/en1 as 8
> mdadm: added /dev/sde to /dev/md/en1 as 9
> mdadm: added /dev/sdk to /dev/md/en1 as 0
> ======================================
>
>
> And of course, CPU shoots to 100%:
> ======================================
> top - 20:38:44 up  6:13,  3 users,  load average: 5.05, 3.25, 1.41
> Tasks: 239 total,   3 running, 236 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  5.9 us, 52.7 sy,  0.0 ni,  0.0 id, 41.4 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem :  1525400 total,    73124 free,   739576 used,   712700 buff/cache
> KiB Swap: 25153532 total, 25111140 free,    42392 used.   415840 avail Mem
>
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>  6423 root      20   0       0      0      0 R  99.0  0.0   4:43.51 md127_raid6
>  1166 root      20   0  280588   8780   6192 S   3.0  0.6   0:06.56 polkitd
>  4022 cisco     20   0  394756  32884  26064 S   3.0  2.2   0:08.77 gnome-disks
>  1903 cisco     20   0  256660  34060  26280 S   2.0  2.2   0:29.56 Xorg
>  2093 cisco     20   0 1760364 153572  55572 S   2.0 10.1   0:17.96 gnome-shell
> ======================================
>
>
> Then surely the array reshape speed goes back down to nothing:
> ======================================
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : active raid6 sdk[0] sde[11] sdc[12] sdd[10] sdf[7] sdj[2] sdg[9] sdb[13]
>       14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [10/8] [UUU_U_UUUU]
>       [=======>.............]  reshape = 39.1% (1146348512/2930135040)
> finish=22057575.1min speed=1K/sec
>       bitmap: 0/22 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> ======================================
>
>
> The size of the trace file is gigantic, so hopefully it doesn't get
> trimmed in the email, but any help would be appreciated, thanks in
> advance:
> ================
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 44739/81554230   #P:2
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>      md127_raid6-6423  [001] .... 22228.159918: analyse_stripe: check
> 7: state 0x13 read           (null) write           (null) written
>       (null)
>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
> 6: state 0xa01 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
> 5: state 0x801 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159920: analyse_stripe: check
> 4: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
> 3: state 0x801 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
> 2: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159922: analyse_stripe: check
> 1: state 0xa01 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159923: analyse_stripe: check
> 0: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159924: handle_stripe:
> locked=2 uptodate=10 to_read=0 to_write=0 failed=4 failed_num=6,5
>      md127_raid6-6423  [001] .... 22228.159925:
> schedule_reconstruction: schedule_reconstruction: stripe 2292697672
> locked: 4 ops_request: 10
>      md127_raid6-6423  [001] .... 22228.159925: raid_run_ops:
> ops_run_reconstruct6: stripe 2292697672
>      md127_raid6-6423  [001] .... 22228.159943:
> ops_complete_reconstruct: ops_complete_reconstruct: stripe 2292697672
>      md127_raid6-6423  [001] .... 22228.159944: handle_stripe:
> handling stripe 2292697680, state=0x1401 cnt=1, pd_idx=7, qd_idx=8
> , check:0, reconstruct:6
> ===========================================
>
> I trimmed it because of the failure to send issue.  However, if
> someone needs a lengthier snip of the trace, let me know.
>
> Thanks,
>
> 'Cisco
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html