kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3643! (or :3319)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

I am experiencing problems using LIO FC target with vmware ESX (6.5).

Thisis the same problem as discussed in https://www.spinics.net/lists/target-devel/msg15670.html I've tried 4.11.x (up to 12) and 4.12.3(and also some kernels from 4.9 line).

My setup is 3 ESX 6.5 hosts using FC target storage with 2x QLogic 2460 (4gbit) (I also have another storage with QLE-2562 (8gbit)- same problem occurs).

I am able to reproduce the problem with almost 100%efficiency - generating large i/o load by concurrently resuming many VMs - problem always occurs within minute.

* on clean4.12.3:

BUG occurs on almost always first ABORT_TASK issued by vmware:

[...]
Jul 27 17:06:40 teststorage [ 31.336673] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80d6dc00 Jul 27 17:06:40 teststorage [ 708.583638] Detected MISCOMPARE for addr: ffff881f8fb15000 buf: ffff881f7a46aa00 Jul 27 17:06:40 teststorage [ 708.583651] Target/iblock: Send MISCOMPARE check condition and sense Jul 27 17:06:42 teststorage [ 710.627503] ABORT_TASK: Found referenced qla2xxx task_tag: 1168472 Jul 27 17:06:43 teststorage [ 711.454055] ------------[ cut here ]------------ Jul 27 17:06:43 teststorage [ 711.454062] kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3643!
Jul 27 17:06:43 teststorage [  711.454067] invalid opcode: 0000 [#1] SMP
Jul 27 17:06:43 teststorage [ 711.454070] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod netconsole lm90 vfat fat intel_rapl x86_pkg_temp_thermal coretemp crct10dif_pclmul ses crc32_pclmul enclosure ghash_clmulni_intel scsi_transport_sas iTCO_wdt iTCO_vendor_support lpc_ich i2c_i801 mei_me mei tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc binfmt_misc nouveau video drm_kms_helper ttm raid1 drm qla2xxx igb e1000e mxm_wmi aacraid dca ptp crc32c_intel pps_core scsi_transport_fc i2c_algo_bit wmi Jul 27 17:06:43 teststorage [ 711.454101] CPU: 3 PID: 209 Comm: kworker/u24:7 Not tainted 4.12.3 #1 Jul 27 17:06:43 teststorage [ 711.454104] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016 Jul 27 17:06:43 teststorage [ 711.454118] Workqueue: tmr-iblock target_tmr_work [target_core_mod] Jul 27 17:06:43 teststorage [ 711.454122] task: ffff881f82173b00 task.stack: ffffc9000d770000 Jul 27 17:06:43 teststorage [ 711.454132] RIP: 0010:qlt_free_cmd+0x138/0x150 [qla2xxx] Jul 27 17:06:43 teststorage [ 711.454135] RSP: 0018:ffffc9000d773d60 EFLAGS: 00010286 Jul 27 17:06:43 teststorage [ 711.454138] RAX: 0000000000000088 RBX: ffff881f8ebb8bf8 RCX: ffffffffa019fade Jul 27 17:06:43 teststorage [ 711.454142] RDX: 000000000000e074 RSI: ffff881f76fc87c0 RDI: 0000000000004000 Jul 27 17:06:43 teststorage [ 711.454145] RBP: ffffc9000d773d80 R08: ffffffffa018c07c R09: ffff881f8ebb8bf8 Jul 27 17:06:43 teststorage [ 711.454148] R10: ffffea007e076ac0 R11: ffff881f81dab640 R12: ffff881f80d6d800 Jul 27 17:06:43 teststorage [ 711.454151] R13: ffff881f8ebb8c80 R14: 0000000000000286 R15: ffff881f8f5e2000 Jul 27 17:06:43 teststorage [ 711.454155] FS: 0000000000000000(0000) GS:ffff881f9f2c0000(0000) knlGS:0000000000000000 Jul 27 17:06:43 teststorage [ 711.454158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 27 17:06:43 teststorage [ 711.454161] CR2: 00007fff00782a68 CR3: 0000001f81e33000 CR4: 00000000003406e0 Jul 27 17:06:43 teststorage [ 711.454164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 27 17:06:43 teststorage [ 711.454167] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 27 17:06:43 teststorage [  711.454170] Call Trace:
Jul 27 17:06:43 teststorage [ 711.454176] tcm_qla2xxx_release_cmd+0x14/0x30 [tcm_qla2xxx] Jul 27 17:06:43 teststorage [ 711.454183] target_put_sess_cmd+0xce/0x140 [target_core_mod] Jul 27 17:06:43 teststorage [ 711.454190] core_tmr_abort_task+0x127/0x190 [target_core_mod] Jul 27 17:06:43 teststorage [ 711.454197] target_tmr_work+0x111/0x120 [target_core_mod]
Jul 27 17:06:43 teststorage [  711.454203] process_one_work+0x144/0x370
Jul 27 17:06:43 teststorage [  711.454206] worker_thread+0x4d/0x3c0
Jul 27 17:06:43 teststorage [  711.454210]  kthread+0x109/0x140
Jul 27 17:06:43 teststorage [  711.454487]  ? rescuer_thread+0x360/0x360
Jul 27 17:06:43 teststorage [  711.455029]  ? kthread_park+0x60/0x60
Jul 27 17:06:43 teststorage [  711.455579] ret_from_fork+0x25/0x30
Jul 27 17:06:43 teststorage [ 711.456122] Code: 7f a6 73 e1 0f b6 83 8c 02 00 00 e9 54 ff ff ff 48 8b bb c0 02 00 00 48 89 de e8 04 fd ff ff 0f b6 83 8c 02 00 00 e9 35 ff ff ff <0f> 0b 48 8b bb 90 02 00 00 e8 8a e2 0a e1 e9 44 ff ff ff 0f 0b Jul 27 17:06:43 teststorage [ 711.457279] RIP: qlt_free_cmd+0x138/0x150 [qla2xxx] RSP: ffffc9000d773d60 Jul 27 17:06:43 teststorage [ 711.457857] ---[ end trace 99e519b9a5b6591b ]---

8<-----------

I have also tried patch from https://www.spinics.net/lists/target-devel/msg15759.html (4db6a8145940d0bbd10265020d681961ce2d3238 - currently not available in git(?))

The results is - no BUGs, system survives a little longer, but it's a matter of minutes when kworker hangs:

Jul 26 15:21:33 teststorage [ 3588.629920] ABORT_TASK: Found referenced qla2xxx task_tag: 1202220 Jul 26 15:21:34 teststorage [ 3589.491615] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1202220 Jul 26 15:21:34 teststorage [ 3589.491700] ABORT_TASK: Found referenced qla2xxx task_tag: 1202264 Jul 26 15:25:16 teststorage [ 3810.752583] INFO: task kworker/u24:2:2975 blocked for more than 120 seconds.
Jul 26 15:25:16 teststorage [ 3810.752672]       Not tainted 4.12.3 #1
Jul 26 15:25:16 teststorage [ 3810.752749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 26 15:25:16 teststorage [ 3810.752831] kworker/u24:2 D 0 2975 2 0x00000080 Jul 26 15:25:16 teststorage [ 3810.752920] Workqueue: tmr-iblock target_tmr_work [target_core_mod]
Jul 26 15:25:16 teststorage [ 3810.753006] Call Trace:
Jul 26 15:25:16 teststorage [ 3810.753207] __schedule+0x285/0x840
Jul 26 15:25:16 teststorage [ 3810.753410] schedule+0x36/0x80
Jul 26 15:25:16 teststorage [ 3810.753623] schedule_timeout+0x242/0x2f0
Jul 26 15:25:16 teststorage [ 3810.753832]  ? radix_tree_lookup+0xd/0x10
Jul 26 15:25:16 teststorage [ 3810.754046]  ? get_work_pool+0x2d/0x40
Jul 26 15:25:16 teststorage [ 3810.754263]  ? flush_work+0x14d/0x190
Jul 26 15:25:16 teststorage [ 3810.754486] wait_for_completion+0x111/0x170
Jul 26 15:25:16 teststorage [ 3810.754717]  ? wake_up_q+0x80/0x80
Jul 26 15:25:16 teststorage [ 3810.754949] __transport_wait_for_tasks+0xa7/0x140 [target_core_mod] Jul 26 15:25:16 teststorage [ 3810.755189] transport_wait_for_tasks+0x53/0x90 [target_core_mod] Jul 26 15:25:16 teststorage [ 3810.755454] core_tmr_abort_task+0x10e/0x190 [target_core_mod] Jul 26 15:25:16 teststorage [ 3810.755717] target_tmr_work+0x111/0x120 [target_core_mod]
Jul 26 15:25:16 teststorage [ 3810.755972] process_one_work+0x144/0x370
Jul 26 15:25:16 teststorage [ 3810.756236] worker_thread+0x4d/0x3c0
Jul 26 15:25:16 teststorage [ 3810.756507] kthread+0x109/0x140
Jul 26 15:25:16 teststorage [ 3810.756785]  ? rescuer_thread+0x360/0x360
Jul 26 15:25:16 teststorage [ 3810.757060]  ? kthread_park+0x60/0x60
Jul 26 15:25:16 teststorage [ 3810.757348]  ? do_syscall_64+0x67/0x150
Jul 26 15:25:16 teststorage [ 3810.757636] ret_from_fork+0x25/0x30
Jul 26 15:25:16 teststorage [ 3810.757927] NMI backtrace for cpu 3
Jul 26 15:25:16 teststorage [ 3810.758220] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3 #1 Jul 26 15:25:16 teststorage [ 3810.758530] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016


(I have also tried 4.13-rc2, but currently I have some other problem with initiating qla2xxx there ("qla2xxx 0000:09:00.0: can't allocate MSI-X affinity masks for 2 vectors"), so I can't use FC target...)

I am willing to provide more information if needed - It's currently my out-of-production storage and so if anyone can look at this problem I can on my side experiment and play with patches, retest, etc.

Any help appreciated :)

regards,
Lukasz Engel

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux