On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@xxxxxxxxx> wrote: > > On 6/5/20 2:26 PM, Michal Soltys wrote: > > On 6/4/20 12:07 AM, Song Liu wrote: > >> > >> The hang happens at expected place. > >> > >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. > >>> [ +0.060545] Tainted: G E > >>> 5.4.19-msl-00001-gbf39596faf12 #2 > >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >> > >> Could you please try disable the timeout message with > >> > >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs > >> > >> And during this wait (after message > >> "r5c_recovery_flush_data_only_stripes before wait_event"), > >> checks whether the raid disks (not the journal disk) are taking IOs > >> (using tools like iostat). > >> > > > > No activity on component drives. > > To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): > > UID PID PPID C SZ RSS PSR STIME TTY TIME CMD > root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 > root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] I guess the md thread stuck at some stripe. Does the kernel have CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() in function handle_stripe()? Thanks, Song