Hi, I have a LSI SAS3008 [0] attached to a few disks. I've setup a md raid10 on on them and created XFS file system on it. While the raid was still rebuilding I rsynced approx 2TiB of data. This went smooth. The raid was still rebuilding and I started doing some I/O and after approximately 5 minutes it stopped doing I/O. Repeatedly. The backtrace shows: | md10_resync D 0 1797 2 0x80000000 | Call Trace: | ? __schedule+0x3f5/0x880 | schedule+0x32/0x80 | raise_barrier+0xc3/0x190 [raid10] | ? remove_wait_queue+0x60/0x60 | raid10_sync_request+0x989/0x1d50 [raid10] | ? is_mddev_idle+0x44/0x12a [md_mod] | ? cpumask_next+0x16/0x20 | ? is_mddev_idle+0xcc/0x12a [md_mod] | md_do_sync+0xc44/0x1030 [md_mod] | ? remove_wait_queue+0x60/0x60 | ? __switch_to_asm+0x34/0x70 | ? md_thread+0x125/0x170 [md_mod] | md_thread+0x125/0x170 [md_mod] | kthread+0xf8/0x130 | ? md_rdev_init+0xc0/0xc0 [md_mod] | ? kthread_create_worker_on_cpu+0x70/0x70 | ret_from_fork+0x35/0x40 | xfsaild/md10 D 0 1841 2 0x80000000 | Call Trace: | ? __schedule+0x3f5/0x880 | schedule+0x32/0x80 | wait_barrier+0x146/0x1a0 [raid10] | ? remove_wait_queue+0x60/0x60 | raid10_write_request+0x74/0x8e0 [raid10] | ? mempool_alloc+0x69/0x190 | ? md_write_start+0xd0/0x210 [md_mod] | raid10_make_request+0xbf/0x140 [raid10] | md_handle_request+0x116/0x190 [md_mod] | md_make_request+0x72/0x170 [md_mod] | generic_make_request+0x1e7/0x410 | ? submit_bio+0x6c/0x140 | ? xfs_inode_buf_verify+0x84/0x150 [xfs] | submit_bio+0x6c/0x140 | ? bio_add_page+0x48/0x60 | _xfs_buf_ioapply+0x324/0x4d0 [xfs] | ? __kernel_fpu_end+0x30/0x80 | ? xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs] | ? __xfs_buf_submit+0xe2/0x240 [xfs] | __xfs_buf_submit+0xe2/0x240 [xfs] | xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs] | ? xfsaild+0x2dc/0x830 [xfs] | ? xfsaild+0x2dc/0x830 [xfs] | xfsaild+0x2dc/0x830 [xfs] | ? __switch_to_asm+0x34/0x70 | ? kthread+0xf8/0x130 | kthread+0xf8/0x130 | ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs] | ? kthread_create_worker_on_cpu+0x70/0x70 | ret_from_fork+0x35/0x40 | kworker/36:3 D 0 2057 2 0x80000000 | Workqueue: md submit_flushes [md_mod] | Call Trace: | ? __schedule+0x3f5/0x880 | schedule+0x32/0x80 | wait_barrier+0x146/0x1a0 [raid10] | ? remove_wait_queue+0x60/0x60 | raid10_write_request+0x74/0x8e0 [raid10] | ? mempool_alloc+0x69/0x190 | ? md_write_start+0xd0/0x210 [md_mod] | ? try_to_wake_up+0x54/0x4a0 | raid10_make_request+0xbf/0x140 [raid10] | md_handle_request+0x116/0x190 [md_mod] | md_make_request+0x72/0x170 [md_mod] | generic_make_request+0x1e7/0x410 | ? raid10_write_request+0x660/0x8e0 [raid10] | raid10_write_request+0x660/0x8e0 [raid10] | ? mempool_alloc+0x69/0x190 | ? md_write_start+0xd0/0x210 [md_mod] | ? __switch_to_asm+0x40/0x70 | ? __switch_to_asm+0x34/0x70 | ? __switch_to_asm+0x40/0x70 | raid10_make_request+0xbf/0x140 [raid10] | md_handle_request+0x116/0x190 [md_mod] | ? __switch_to_asm+0x40/0x70 | submit_flushes+0x21/0x40 [md_mod] | process_one_work+0x191/0x370 | worker_thread+0x4f/0x3b0 | kthread+0xf8/0x130 | ? rescuer_thread+0x340/0x340 | ? kthread_create_worker_on_cpu+0x70/0x70 | ret_from_fork+0x35/0x40 | borg D 0 4097 4096 0x00000000 | Call Trace: | ? __schedule+0x3f5/0x880 | ? xlog_bdstrat+0x30/0x60 [xfs] | schedule+0x32/0x80 | __xfs_log_force_lsn+0x155/0x270 [xfs] | ? wake_up_q+0x70/0x70 | ? xfs_file_fsync+0x100/0x230 [xfs] | xfs_log_force_lsn+0x91/0x120 [xfs] | xfs_file_fsync+0x100/0x230 [xfs] | do_fsync+0x38/0x60 | __x64_sys_fsync+0x10/0x20 | do_syscall_64+0x55/0x110 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 | RIP: 0033:0x7fb0bf0b3010 | Code: Bad RIP value. This was latest v4.19 from two weeks ago. This looks to me like mp3sas didn't wakeup someone after an I/O completed and everything stopped. Since this machine runs productive I didn't have much time to debug this. I replaced XFS with EXT4 and the problem disappeared. I even restarted the md raid rebuild to have the same testing scenario. Nothing. While it looks like a XFS problem I believe that XFS manages to submit enough requests to confuse mp3sas while EXT4 doesn't. Is this a known problem? [0] PCI ID 1000:0097 Sebastian