Hi,
On 4/12/2018 4:35 PM, tj@xxxxxxxxxx wrote:
Hello, Israel.
On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote:
On 4/12/2018 12:31 AM, tj@xxxxxxxxxx wrote:
Hey, again.
On Wed, Apr 11, 2018 at 10:07:33AM -0700, tj@xxxxxxxxxx wrote:
Hello, Israel.
On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote:
Just noticed this one, this looks interesting to me as well. Israel,
can you run your test with this patch?
Yes, I just did and it looks good.
Awesome.
Just to be sure, you tested the combined patch and saw the XXX debug
messages, right?
I am not sure I understand.
What is the combined patch?
What are the debug messages that I need to see?
There are total of three patches and I posted the combined patch + a
debug message in another post. Can you please test the following
patch? It'll print out something like the following (possibly many of
them).
I tested this combined patch with your debug print and I got the
following call trace:
Apr 15 11:42:35 nvme nvme1: I/O 55 QID 10 timeout, reset controller
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed
completions
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed
completions
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed
completions
Apr 15 11:42:35 WARNING: CPU: 1 PID: 648 at block/blk-mq.c:534
__blk_mq_complete_request+0x154/0x160
Apr 15 11:42:35 Modules linked in: nvme_rdma rdma_cm iw_cm ib_cm
nvme_fabrics nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi netconsole mlx4_ib ib_core
mlx4_en mlx4_core xfs libcrc32c dm_mirror dm_region_hash dm_log
dm_multipath scsi_dh_rdac scsi_dh_emc dm_mod dax scsi_dh_alua
x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass
ghash_clmulni_intel pcbc iTCO_wdt aesni_intel nfsd iTCO_vendor_support
aes_x86_64 ipmi_si crypto_simd lpc_ich cryptd ipmi_devintf shpchp wmi
acpi_pad glue_helper ipmi_msghandler pcspkr sg i2c_i801 mfd_core ioatdma
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
sd_mod igb i2c_algo_bit ahci i2c_core xhci_pci crc32c_intel xhci_hcd
libahci dca configfs nvme nvme_core ipv6 crc_ccitt autofs4 [last
unloaded: nvme_fabrics]
Apr 15 11:42:35 CPU: 1 PID: 648 Comm: kworker/1:1H Not tainted 4.16.0+ #8
Apr 15 11:42:35 Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS
2.0 12/17/2015
Apr 15 11:42:35 Workqueue: kblockd blk_mq_timeout_work
Apr 15 11:42:35 RIP: 0010:__blk_mq_complete_request+0x154/0x160
Apr 15 11:42:35 RSP: 0018:ffffc9000407bd68 EFLAGS: 00010297
Apr 15 11:42:35 RAX: 0000000000000000 RBX: ffff8808310236c0 RCX:
0000000000000009
Apr 15 11:42:35 RDX: 0000000000000009 RSI: 0000000000000000 RDI:
ffff8808310236c0
Apr 15 11:42:35 RBP: ffffe8ffffae12c0 R08: 0000000000000000 R09:
00000000000004ea
Apr 15 11:42:35 R10: 000000000000006c R11: ffffc90003dfda80 R12:
0000000000000000
Apr 15 11:42:35 R13: 0000000000000010 R14: 0000000000000010 R15:
0000000000000000
Apr 15 11:42:35 FS: 0000000000000000(0000) GS:ffff88047fa40000(0000)
knlGS:0000000000000000
Apr 15 11:42:35 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 15 11:42:35 CR2: 00007f6ab2db3140 CR3: 0000000001e0a006 CR4:
00000000001606e0
Apr 15 11:42:35 Call Trace:
Apr 15 11:42:35 blk_mq_complete_request+0x4b/0x90
Apr 15 11:42:35 blk_mq_timeout_reset_cleanup+0x27/0x40
Apr 15 11:42:35 bt_iter+0x43/0x50
Apr 15 11:42:35 blk_mq_queue_tag_busy_iter+0xfb/0x230
Apr 15 11:42:35 ? blk_mq_complete_request+0x90/0x90
Apr 15 11:42:35 ? blk_mq_complete_request+0x90/0x90
Apr 15 11:42:35 ? __call_rcu.constprop.72+0x170/0x1c0
Apr 15 11:42:35 blk_mq_timeout_work+0x191/0x1f0
Apr 15 11:42:35 process_one_work+0x140/0x2a0
Apr 15 11:42:35 worker_thread+0x3f/0x3c0
Apr 15 11:42:35 kthread+0xeb/0x120
Apr 15 11:42:35 ? process_one_work+0x2a0/0x2a0
Apr 15 11:42:35 ? kthread_bind+0x10/0x10
Apr 15 11:42:35 ? SyS_exit_group+0xb/0x10
Apr 15 11:42:35 ret_from_fork+0x35/0x40
Apr 15 11:42:35 Code: 00 00 00 00 00 8b 7d 40 e8 0a 20 e7 ff e9 6e ff ff
ff 48 8b 35 1e ca ba 00 48 83 c7 10 48 83 c6 64 e8 71 ee e5 ff e9 77 ff
ff ff <0f> 0b e9 c3 fe ff ff 0f 1f 44 00 00 41 54 45 31 e4 55 53 48 8b
Apr 15 11:42:35 ---[ end trace fbf397c4b27ea0b8 ]---
Apr 15 11:42:35 XXX blk_mq_timeout_reset_cleanup(): executed 1 missed
completions
Apr 15 11:42:35 BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
Regards,
Israel