Stall with RAID10 + XFS + mpt3sas

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Fri, 1 Feb 2019 17:16:36 +0100

Hi,

I have a LSI SAS3008 [0] attached to a few disks. I've setup a md raid10
on on them and created XFS file system on it. While the raid was still
rebuilding I rsynced approx 2TiB of data. This went smooth. The raid
was still rebuilding and I started doing some I/O and after
approximately 5 minutes it stopped doing I/O. Repeatedly.
The backtrace shows:
| md10_resync     D    0  1797      2 0x80000000
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  raise_barrier+0xc3/0x190 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_sync_request+0x989/0x1d50 [raid10]
|  ? is_mddev_idle+0x44/0x12a [md_mod]
|  ? cpumask_next+0x16/0x20
|  ? is_mddev_idle+0xcc/0x12a [md_mod]
|  md_do_sync+0xc44/0x1030 [md_mod]
|  ? remove_wait_queue+0x60/0x60
|  ? __switch_to_asm+0x34/0x70
|  ? md_thread+0x125/0x170 [md_mod]
|  md_thread+0x125/0x170 [md_mod]
|  kthread+0xf8/0x130
|  ? md_rdev_init+0xc0/0xc0 [md_mod]
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| xfsaild/md10    D    0  1841      2 0x80000000
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  wait_barrier+0x146/0x1a0 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_write_request+0x74/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  md_make_request+0x72/0x170 [md_mod]
|  generic_make_request+0x1e7/0x410
|  ? submit_bio+0x6c/0x140
|  ? xfs_inode_buf_verify+0x84/0x150 [xfs]
|  submit_bio+0x6c/0x140
|  ? bio_add_page+0x48/0x60
|  _xfs_buf_ioapply+0x324/0x4d0 [xfs]
|  ? __kernel_fpu_end+0x30/0x80
|  ? xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs]
|  ? __xfs_buf_submit+0xe2/0x240 [xfs]
|  __xfs_buf_submit+0xe2/0x240 [xfs]
|  xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs]
|  ? xfsaild+0x2dc/0x830 [xfs]
|  ? xfsaild+0x2dc/0x830 [xfs]
|  xfsaild+0x2dc/0x830 [xfs]
|  ? __switch_to_asm+0x34/0x70
|  ? kthread+0xf8/0x130
|  kthread+0xf8/0x130
|  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| kworker/36:3    D    0  2057      2 0x80000000
| Workqueue: md submit_flushes [md_mod]
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  wait_barrier+0x146/0x1a0 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_write_request+0x74/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  ? try_to_wake_up+0x54/0x4a0
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  md_make_request+0x72/0x170 [md_mod]
|  generic_make_request+0x1e7/0x410
|  ? raid10_write_request+0x660/0x8e0 [raid10]
|  raid10_write_request+0x660/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  ? __switch_to_asm+0x40/0x70
|  ? __switch_to_asm+0x34/0x70
|  ? __switch_to_asm+0x40/0x70
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  ? __switch_to_asm+0x40/0x70
|  submit_flushes+0x21/0x40 [md_mod]
|  process_one_work+0x191/0x370
|  worker_thread+0x4f/0x3b0
|  kthread+0xf8/0x130
|  ? rescuer_thread+0x340/0x340
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| borg            D    0  4097   4096 0x00000000
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  ? xlog_bdstrat+0x30/0x60 [xfs]
|  schedule+0x32/0x80
|  __xfs_log_force_lsn+0x155/0x270 [xfs]
|  ? wake_up_q+0x70/0x70
|  ? xfs_file_fsync+0x100/0x230 [xfs]
|  xfs_log_force_lsn+0x91/0x120 [xfs]
|  xfs_file_fsync+0x100/0x230 [xfs]
|  do_fsync+0x38/0x60
|  __x64_sys_fsync+0x10/0x20
|  do_syscall_64+0x55/0x110
|  entry_SYSCALL_64_after_hwframe+0x44/0xa9
| RIP: 0033:0x7fb0bf0b3010
| Code: Bad RIP value.

This was latest v4.19 from two weeks ago. This looks to me like mp3sas
didn't wakeup someone after an I/O completed and everything stopped.
Since this machine runs productive I didn't have much time to debug
this. I replaced XFS with EXT4 and the problem disappeared. I even
restarted the md raid rebuild to have the same testing scenario.
Nothing.
While it looks like a XFS problem I believe that XFS manages to submit
enough requests to confuse mp3sas while EXT4 doesn't.
Is this a known problem?

[0] PCI ID 1000:0097

Sebastian