Kernel bug on shli_md-for-next

"Tkaczyk, Mariusz" <mariusz.tkaczyk@xxxxxxxxx> · Tue, 11 Dec 2018 09:36:33 +0000

Hi Shaohua,
On current HEAD of Shli_md_for_next branch I observe kernel bug when I 
try to create XFS filesystem on Raid10 array created from at least one 
SSD drive mixed with HDDs.

Steps:

  mdadm -CR /dev/md/imsm0 -eimsm -n4 /dev/sd[bcdf]
  mdadm -CR /dev/md/vol10 -l10 -n4 /dev/sd[bcdf] -z 5G --assume-clean
  mkfs.xfs -f /dev/md/vol10

I rebased  Shli_md_for_next branch to v4.20-rc5, and issue seems to be 
fixed there. I suspect that root cause of this problem was in block 
layer. Can you update for_next branch to this tag?

It works there but is still something wrong because creating XFS 
filesystem takes a lot of time. With hung tasks detection enabled, after 
120s I get warnings about blocked mkfs task. It didn't happen when array 
is created on "the same" drives. I test this on v4.19 and there it the 
same problem.
I suspect that it is related with trim feature, but I don't know why 
ignoring trims sent to HDDs blocks XFS creation and why only raid 10 is 
affected. I can't reproduce it on other raid levels.
Under debugging I saw that for raid10 array many small discard requests 
are sent to ssd device, other raid levels don't sent them.
Do you have any idea what is a root cause of this performance drop?

Hung task call trace (for_next rebased to 4.20.rc5):

[ 7253.539795] INFO: task mkfs.xfs:11894 blocked for more than 120 seconds.
[ 7253.627614]       Not tainted 4.20.0-rc5+ #49
[ 7253.687335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 7253.788919] mkfs.xfs        D    0 11894   9602 0x00000080
[ 7253.862554] Call Trace:
[ 7253.899604]  ? __schedule+0x291/0x840
[ 7253.951187]  schedule+0x32/0x80
[ 7253.996514]  schedule_timeout+0x1d5/0x2f0
[ 7254.052127]  ? blk_flush_plug_list+0xc7/0x240
[ 7254.111796]  io_schedule_timeout+0x19/0x40
[ 7254.168318]  wait_for_completion_io+0x117/0x180
[ 7254.230095]  ? wake_up_q+0x70/0x70
[ 7254.278341]  submit_bio_wait+0x5b/0x80
[ 7254.330777]  blkdev_issue_discard+0x76/0xc0
[ 7254.388324]  blk_ioctl_discard+0xc5/0x100
[ 7254.443644]  blkdev_ioctl+0x28e/0x9b0
[ 7254.494648]  ? __blkdev_put+0x17a/0x1e0
[ 7254.547565]  block_ioctl+0x3d/0x40
[ 7254.595257]  do_vfs_ioctl+0xa6/0x620
[ 7254.645013]  ? __fput+0x157/0x200
[ 7254.691658]  ? syscall_trace_enter+0x1c9/0x2b0
[ 7254.751884]  ksys_ioctl+0x60/0x90
[ 7254.798552]  ? exit_to_usermode_loop+0x7d/0xb8
[ 7254.858827]  __x64_sys_ioctl+0x16/0x20
[ 7254.910713]  do_syscall_64+0x4f/0x190
[ 7254.961535]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Kernel bug on current HEAD:

[ 2433.067158] ------------[ cut here ]------------
[ 2433.124047] kernel BUG at block/blk-core.c:1771!
[ 2433.181081] invalid opcode: 0000 [#1] SMP PTI
[ 2433.235100] CPU: 9 PID: 9159 Comm: md126_raid10 Not tainted 
4.20.0-rc1+ #48
[ 2433.320648] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS 
GRNDCRB1.86B.0032.D05.1404281140 04/28/2014
[ 2433.438888] RIP: 0010:__blk_put_request+0x143/0x1d0
[ 2433.500056] Code: 48 c7 c6 b0 18 05 82 48 c7 c7 2d 07 30 82 4c 89 04 
24 48 8d 48 0c 31 c0 e8 ce 72 ce ff 4c 8b 04 24 48 8b 43 40 49 39 c0 74 
02 <0f> 0b 8b 43 18 a9 00 00 01 00 74 02 0f 0b f6 c4 10 49 8b 7d 40
  74
[ 2433.731830] RSP: 0018:ffffc90002ea7c60 EFLAGS: 00010083
[ 2433.798122] RAX: ffff88026ca62160 RBX: ffff880213e0cc00 RCX: 
0000000000000006
[ 2433.887583] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 
ffff880277655410
[ 2433.977221] RBP: ffff88026ca68000 R08: ffff880213e0cc40 R09: 
ffffffff82063060
[ 2434.067020] R10: 0000000000000001 R11: 0000000000aaaaaa R12: 
0000000000017001
[ 2434.156990] R13: ffff88026ca68000 R14: 0000000000000000 R15: 
ffff88026ca68040
[ 2434.247107] FS:  0000000000000000(0000) GS:ffff880277640000(0000) 
knlGS:0000000000000000
[ 2434.348945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2434.422859] CR2: 00007f9548c34000 CR3: 0000000268812001 CR4: 
00000000001606e0
[ 2434.513674] Call Trace:
[ 2434.548360]  blk_queue_bio+0x16a/0x3e0
[ 2434.598792]  generic_make_request+0x320/0x390
[ 2434.656634]  flush_pending_writes+0x135/0x180 [raid10]
[ 2434.724006]  ? __switch_to_asm+0x34/0x70
[ 2434.776919]  ? __switch_to_asm+0x40/0x70
[ 2434.829887]  ? __switch_to_asm+0x34/0x70
[ 2434.882835]  ? __switch_to_asm+0x40/0x70
[ 2434.935800]  ? __switch_to_asm+0x34/0x70
[ 2434.988752]  ? __switch_to_asm+0x40/0x70
[ 2435.041716]  ? __switch_to_asm+0x34/0x70
[ 2435.094713]  raid10d+0x1bb/0x1400 [raid10]
[ 2435.149883]  ? __schedule+0x52d/0x690
[ 2435.199929]  ? __switch_to_asm+0x40/0x70
[ 2435.253208]  ? schedule+0x6a/0x80
[ 2435.299256]  ? schedule_timeout+0x3a/0x310
[ 2435.354785]  md_thread+0x103/0x160 [md_mod]
[ 2435.411427]  ? do_wait_intr_irq+0x90/0x90
[ 2435.466100]  kthread+0x115/0x120
[ 2435.511493]  ? md_rdev_init+0xc0/0xc0 [md_mod]
[ 2435.571621]  ? __kthread_cancel_work+0x80/0x80
[ 2435.631877]  ret_from_fork+0x35/0x40
[ 2435.681802] Modules linked in: raid10 md_mod nvme nvme_core
[ 2435.755914] ---[ end trace 4b9aec53ceb844af ]---