Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 4/12/2018 4:35 PM, tj@xxxxxxxxxx wrote:
Hello, Israel.

On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote:
On 4/12/2018 12:31 AM, tj@xxxxxxxxxx wrote:
Hey, again.

On Wed, Apr 11, 2018 at 10:07:33AM -0700, tj@xxxxxxxxxx wrote:
Hello, Israel.

On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote:
Just noticed this one, this looks interesting to me as well. Israel,
can you run your test with this patch?
Yes, I just did and it looks good.
Awesome.
Just to be sure, you tested the combined patch and saw the XXX debug
messages, right?
I am not sure I understand.
What is the combined patch?
What are the debug messages that I need to see?
There are total of three patches and I posted the combined patch + a
debug message in another post.  Can you please test the following
patch?  It'll print out something like the following (possibly many of
them).

I tested this combined patch with your debug print and I got the following call trace:

Apr 15 11:42:35 nvme nvme1: I/O 55 QID 10 timeout, reset controller
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed completions Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed completions Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed completions Apr 15 11:42:35 WARNING: CPU: 1 PID: 648 at block/blk-mq.c:534 __blk_mq_complete_request+0x154/0x160 Apr 15 11:42:35 Modules linked in: nvme_rdma rdma_cm iw_cm ib_cm nvme_fabrics nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole mlx4_ib ib_core mlx4_en mlx4_core xfs libcrc32c dm_mirror dm_region_hash dm_log dm_multipath scsi_dh_rdac scsi_dh_emc dm_mod dax scsi_dh_alua x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass ghash_clmulni_intel pcbc iTCO_wdt aesni_intel nfsd iTCO_vendor_support aes_x86_64 ipmi_si crypto_simd lpc_ich cryptd ipmi_devintf shpchp wmi acpi_pad glue_helper ipmi_msghandler pcspkr sg i2c_i801 mfd_core ioatdma auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod igb i2c_algo_bit ahci i2c_core xhci_pci crc32c_intel xhci_hcd libahci dca configfs nvme nvme_core ipv6 crc_ccitt autofs4 [last unloaded: nvme_fabrics]
Apr 15 11:42:35 CPU: 1 PID: 648 Comm: kworker/1:1H Not tainted 4.16.0+ #8
Apr 15 11:42:35 Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 2.0 12/17/2015
Apr 15 11:42:35 Workqueue: kblockd blk_mq_timeout_work
Apr 15 11:42:35 RIP: 0010:__blk_mq_complete_request+0x154/0x160
Apr 15 11:42:35 RSP: 0018:ffffc9000407bd68 EFLAGS: 00010297
Apr 15 11:42:35 RAX: 0000000000000000 RBX: ffff8808310236c0 RCX: 0000000000000009 Apr 15 11:42:35 RDX: 0000000000000009 RSI: 0000000000000000 RDI: ffff8808310236c0 Apr 15 11:42:35 RBP: ffffe8ffffae12c0 R08: 0000000000000000 R09: 00000000000004ea Apr 15 11:42:35 R10: 000000000000006c R11: ffffc90003dfda80 R12: 0000000000000000 Apr 15 11:42:35 R13: 0000000000000010 R14: 0000000000000010 R15: 0000000000000000 Apr 15 11:42:35 FS:  0000000000000000(0000) GS:ffff88047fa40000(0000) knlGS:0000000000000000
Apr 15 11:42:35 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 15 11:42:35 CR2: 00007f6ab2db3140 CR3: 0000000001e0a006 CR4: 00000000001606e0
Apr 15 11:42:35 Call Trace:
Apr 15 11:42:35  blk_mq_complete_request+0x4b/0x90
Apr 15 11:42:35  blk_mq_timeout_reset_cleanup+0x27/0x40
Apr 15 11:42:35  bt_iter+0x43/0x50
Apr 15 11:42:35  blk_mq_queue_tag_busy_iter+0xfb/0x230
Apr 15 11:42:35  ? blk_mq_complete_request+0x90/0x90
Apr 15 11:42:35  ? blk_mq_complete_request+0x90/0x90
Apr 15 11:42:35  ? __call_rcu.constprop.72+0x170/0x1c0
Apr 15 11:42:35  blk_mq_timeout_work+0x191/0x1f0
Apr 15 11:42:35  process_one_work+0x140/0x2a0
Apr 15 11:42:35  worker_thread+0x3f/0x3c0
Apr 15 11:42:35  kthread+0xeb/0x120
Apr 15 11:42:35  ? process_one_work+0x2a0/0x2a0
Apr 15 11:42:35  ? kthread_bind+0x10/0x10
Apr 15 11:42:35  ? SyS_exit_group+0xb/0x10
Apr 15 11:42:35  ret_from_fork+0x35/0x40
Apr 15 11:42:35 Code: 00 00 00 00 00 8b 7d 40 e8 0a 20 e7 ff e9 6e ff ff ff 48 8b 35 1e ca ba 00 48 83 c7 10 48 83 c6 64 e8 71 ee e5 ff e9 77 ff ff ff <0f> 0b e9 c3 fe ff ff 0f 1f 44 00 00 41 54 45 31 e4 55 53 48 8b
Apr 15 11:42:35 ---[ end trace fbf397c4b27ea0b8 ]---
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed completions Apr 15 11:42:35 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

Regards,
Israel



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]