On Mon, May 18, 2020 at 4:43 AM Michal Soltys <msoltyspl@xxxxxxxxx> wrote: > > On 5/16/20 2:15 AM, Song Liu wrote: > > On Thu, May 14, 2020 at 4:07 AM Michal Soltys <msoltyspl@xxxxxxxxx> wrote: > >> > >> On 5/13/20 6:17 PM, Song Liu wrote: > >>>>> > >>>>> Are these captured back to back? I am asking because they showed different > >>>>> "Events" number. > >>>> > >>>> Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > >>>> > >>>>> > >>>>> Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > >>>>> > >>>> > >>>> The output is empty: > >>>> > >>>> xs22:/☠ ps -eF fww | grep mdadm > >>>> root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 > >>>> xs22:/☠ cd /proc/10332 > >>>> xs22:/proc/10332☠ cat stack > >>>> xs22:/proc/10332☠ > >>> > >>> Hmm... Could you please share the strace output of "mdadm -A" command? Like > >>> > >>> strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... > >>> > >>> Thanks, > >>> Song > >> > >> I did one more thing - ran cat on that stack in another terminal every > >> 0.1 seconds, this is what was found: > >> > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> [<0>] submit_bio_wait+0x5b/0x80 > >> [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > >> [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > >> [<0>] r5l_start+0x99e/0xe70 [raid456] > >> [<0>] md_start.part.48+0x2e/0x50 [md_mod] > >> [<0>] do_md_run+0x64/0x100 [md_mod] > >> [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > >> [<0>] blkdev_ioctl+0x4d0/0xa00 > >> [<0>] block_ioctl+0x39/0x40 > >> [<0>] do_vfs_ioctl+0xa4/0x630 > >> [<0>] ksys_ioctl+0x60/0x90 > >> [<0>] __x64_sys_ioctl+0x16/0x20 > >> [<0>] do_syscall_64+0x52/0x160 > >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> cat: /proc//stack: No such file or directory > >> [<0>] submit_bio_wait+0x5b/0x80 > >> [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > >> [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > >> [<0>] r5l_start+0x99e/0xe70 [raid456] > >> [<0>] md_start.part.48+0x2e/0x50 [md_mod] > >> [<0>] do_md_run+0x64/0x100 [md_mod] > >> [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > >> [<0>] blkdev_ioctl+0x4d0/0xa00 > >> [<0>] block_ioctl+0x39/0x40 > >> [<0>] do_vfs_ioctl+0xa4/0x630 > >> [<0>] ksys_ioctl+0x60/0x90 > >> [<0>] __x64_sys_ioctl+0x16/0x20 > >> [<0>] do_syscall_64+0x52/0x160 > >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> > >> > >> Strace output attached. > > > > So mdadm hangs at "ioctl(4, RUN_ARRAY", right? > > > > Yes, it's consistent across attempts. > > > Does /proc/<pid of hanging mdadm>/stack always shows the same output above? > > Having peeked closer - it actually tries to do something somehow - > attached the stack file dumped every ~0.01 seconds. > > After a while I can hit non-empty cat /proc/<pid>/stack every 5-15 > seconds this way. > > At one point "hanged" mdadm has been running this way for 24hr+ without > any practical effect in the end. > Thanks for these information. It looks like the kernel had some issue in recovery. Could you please try the following: 1. check dmesg, we should see one of the following at some point: pr_info("md/raid:%s: starting from clean shutdown\n", or pr_info("md/raid:%s: recovering %d data-only stripes and %d data-parity stripes\n", 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), specifically, the 4th argument. With bcc, it is something like: trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' -M above limits the number of outputs to 100 lines. We may need to increase the limit or remove the constraint. If the system doesn't have bcc/bpftrace. You can also try with kprobe. Thanks, Song