Hi Lukasz, > On Jul 27, 2017, at 8:23 AM, Łukasz Engel <lukasz.engel@xxxxxxxxx> wrote: > > Hello > > I am experiencing problems using LIO FC target with vmware ESX (6.5). > > Thisis the same problem as discussed in https://www.spinics.net/lists/target-devel/msg15670.html > I've tried 4.11.x (up to 12) and 4.12.3(and also some kernels from 4.9 line). > > My setup is 3 ESX 6.5 hosts using FC target storage with 2x QLogic 2460 (4gbit) (I also have another storage with QLE-2562 (8gbit)- same problem occurs). > > I am able to reproduce the problem with almost 100%efficiency - generating large i/o load by concurrently resuming many VMs - problem always occurs within minute. > > * on clean4.12.3: > > BUG occurs on almost always first ABORT_TASK issued by vmware: > > [...] > Jul 27 17:06:40 teststorage [ 31.336673] qla2xxx [0000:06:00.0]-ffff:12: qla24xx_do_nack_work create sess success ffff881f80d6dc00 > Jul 27 17:06:40 teststorage [ 708.583638] Detected MISCOMPARE for addr: ffff881f8fb15000 buf: ffff881f7a46aa00 > Jul 27 17:06:40 teststorage [ 708.583651] Target/iblock: Send MISCOMPARE check condition and sense > Jul 27 17:06:42 teststorage [ 710.627503] ABORT_TASK: Found referenced qla2xxx task_tag: 1168472 > Jul 27 17:06:43 teststorage [ 711.454055] ------------[ cut here ]------------ > Jul 27 17:06:43 teststorage [ 711.454062] kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3643! > Jul 27 17:06:43 teststorage [ 711.454067] invalid opcode: 0000 [#1] SMP > Jul 27 17:06:43 teststorage [ 711.454070] Modules linked in: tcm_qla2xxx target_core_user uio target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod netconsole lm90 vfat fat intel_rapl x86_pkg_temp_thermal coretemp crct10dif_pclmul ses crc32_pclmul enclosure ghash_clmulni_intel scsi_transport_sas iTCO_wdt iTCO_vendor_support lpc_ich i2c_i801 mei_me mei tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc binfmt_misc nouveau video drm_kms_helper ttm raid1 drm qla2xxx igb e1000e mxm_wmi aacraid dca ptp crc32c_intel pps_core scsi_transport_fc i2c_algo_bit wmi > Jul 27 17:06:43 teststorage [ 711.454101] CPU: 3 PID: 209 Comm: kworker/u24:7 Not tainted 4.12.3 #1 > Jul 27 17:06:43 teststorage [ 711.454104] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016 > Jul 27 17:06:43 teststorage [ 711.454118] Workqueue: tmr-iblock target_tmr_work [target_core_mod] > Jul 27 17:06:43 teststorage [ 711.454122] task: ffff881f82173b00 task.stack: ffffc9000d770000 > Jul 27 17:06:43 teststorage [ 711.454132] RIP: 0010:qlt_free_cmd+0x138/0x150 [qla2xxx] > Jul 27 17:06:43 teststorage [ 711.454135] RSP: 0018:ffffc9000d773d60 EFLAGS: 00010286 > Jul 27 17:06:43 teststorage [ 711.454138] RAX: 0000000000000088 RBX: ffff881f8ebb8bf8 RCX: ffffffffa019fade > Jul 27 17:06:43 teststorage [ 711.454142] RDX: 000000000000e074 RSI: ffff881f76fc87c0 RDI: 0000000000004000 > Jul 27 17:06:43 teststorage [ 711.454145] RBP: ffffc9000d773d80 R08: ffffffffa018c07c R09: ffff881f8ebb8bf8 > Jul 27 17:06:43 teststorage [ 711.454148] R10: ffffea007e076ac0 R11: ffff881f81dab640 R12: ffff881f80d6d800 > Jul 27 17:06:43 teststorage [ 711.454151] R13: ffff881f8ebb8c80 R14: 0000000000000286 R15: ffff881f8f5e2000 > Jul 27 17:06:43 teststorage [ 711.454155] FS: 0000000000000000(0000) GS:ffff881f9f2c0000(0000) knlGS:0000000000000000 > Jul 27 17:06:43 teststorage [ 711.454158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 27 17:06:43 teststorage [ 711.454161] CR2: 00007fff00782a68 CR3: 0000001f81e33000 CR4: 00000000003406e0 > Jul 27 17:06:43 teststorage [ 711.454164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Jul 27 17:06:43 teststorage [ 711.454167] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Jul 27 17:06:43 teststorage [ 711.454170] Call Trace: > Jul 27 17:06:43 teststorage [ 711.454176] tcm_qla2xxx_release_cmd+0x14/0x30 [tcm_qla2xxx] > Jul 27 17:06:43 teststorage [ 711.454183] target_put_sess_cmd+0xce/0x140 [target_core_mod] > Jul 27 17:06:43 teststorage [ 711.454190] core_tmr_abort_task+0x127/0x190 [target_core_mod] > Jul 27 17:06:43 teststorage [ 711.454197] target_tmr_work+0x111/0x120 [target_core_mod] > Jul 27 17:06:43 teststorage [ 711.454203] process_one_work+0x144/0x370 > Jul 27 17:06:43 teststorage [ 711.454206] worker_thread+0x4d/0x3c0 > Jul 27 17:06:43 teststorage [ 711.454210] kthread+0x109/0x140 > Jul 27 17:06:43 teststorage [ 711.454487] ? rescuer_thread+0x360/0x360 > Jul 27 17:06:43 teststorage [ 711.455029] ? kthread_park+0x60/0x60 > Jul 27 17:06:43 teststorage [ 711.455579] ret_from_fork+0x25/0x30 > Jul 27 17:06:43 teststorage [ 711.456122] Code: 7f a6 73 e1 0f b6 83 8c 02 00 00 e9 54 ff ff ff 48 8b bb c0 02 00 00 48 89 de e8 04 fd ff ff 0f b6 83 8c 02 00 00 e9 35 ff ff ff <0f> 0b 48 8b bb 90 02 00 00 e8 8a e2 0a e1 e9 44 ff ff ff 0f 0b > Jul 27 17:06:43 teststorage [ 711.457279] RIP: qlt_free_cmd+0x138/0x150 [qla2xxx] RSP: ffffc9000d773d60 > Jul 27 17:06:43 teststorage [ 711.457857] ---[ end trace 99e519b9a5b6591b ]--- > > 8<----------- > > I have also tried patch from https://www.spinics.net/lists/target-devel/msg15759.html (4db6a8145940d0bbd10265020d681961ce2d3238 - currently not available in git(?)) > > The results is - no BUGs, system survives a little longer, but it's a matter of minutes when kworker hangs: > > Jul 26 15:21:33 teststorage [ 3588.629920] ABORT_TASK: Found referenced qla2xxx task_tag: 1202220 > Jul 26 15:21:34 teststorage [ 3589.491615] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1202220 > Jul 26 15:21:34 teststorage [ 3589.491700] ABORT_TASK: Found referenced qla2xxx task_tag: 1202264 > Jul 26 15:25:16 teststorage [ 3810.752583] INFO: task kworker/u24:2:2975 blocked for more than 120 seconds. > Jul 26 15:25:16 teststorage [ 3810.752672] Not tainted 4.12.3 #1 > Jul 26 15:25:16 teststorage [ 3810.752749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 26 15:25:16 teststorage [ 3810.752831] kworker/u24:2 D 0 2975 2 0x00000080 > Jul 26 15:25:16 teststorage [ 3810.752920] Workqueue: tmr-iblock target_tmr_work [target_core_mod] > Jul 26 15:25:16 teststorage [ 3810.753006] Call Trace: > Jul 26 15:25:16 teststorage [ 3810.753207] __schedule+0x285/0x840 > Jul 26 15:25:16 teststorage [ 3810.753410] schedule+0x36/0x80 > Jul 26 15:25:16 teststorage [ 3810.753623] schedule_timeout+0x242/0x2f0 > Jul 26 15:25:16 teststorage [ 3810.753832] ? radix_tree_lookup+0xd/0x10 > Jul 26 15:25:16 teststorage [ 3810.754046] ? get_work_pool+0x2d/0x40 > Jul 26 15:25:16 teststorage [ 3810.754263] ? flush_work+0x14d/0x190 > Jul 26 15:25:16 teststorage [ 3810.754486] wait_for_completion+0x111/0x170 > Jul 26 15:25:16 teststorage [ 3810.754717] ? wake_up_q+0x80/0x80 > Jul 26 15:25:16 teststorage [ 3810.754949] __transport_wait_for_tasks+0xa7/0x140 [target_core_mod] > Jul 26 15:25:16 teststorage [ 3810.755189] transport_wait_for_tasks+0x53/0x90 [target_core_mod] > Jul 26 15:25:16 teststorage [ 3810.755454] core_tmr_abort_task+0x10e/0x190 [target_core_mod] > Jul 26 15:25:16 teststorage [ 3810.755717] target_tmr_work+0x111/0x120 [target_core_mod] > Jul 26 15:25:16 teststorage [ 3810.755972] process_one_work+0x144/0x370 > Jul 26 15:25:16 teststorage [ 3810.756236] worker_thread+0x4d/0x3c0 > Jul 26 15:25:16 teststorage [ 3810.756507] kthread+0x109/0x140 > Jul 26 15:25:16 teststorage [ 3810.756785] ? rescuer_thread+0x360/0x360 > Jul 26 15:25:16 teststorage [ 3810.757060] ? kthread_park+0x60/0x60 > Jul 26 15:25:16 teststorage [ 3810.757348] ? do_syscall_64+0x67/0x150 > Jul 26 15:25:16 teststorage [ 3810.757636] ret_from_fork+0x25/0x30 > Jul 26 15:25:16 teststorage [ 3810.757927] NMI backtrace for cpu 3 > Jul 26 15:25:16 teststorage [ 3810.758220] CPU: 3 PID: 107 Comm: khungtaskd Not tainted 4.12.3 #1 > Jul 26 15:25:16 teststorage [ 3810.758530] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3402 11/14/2016 > > Nic, let me know if you need anything from me/Cavium (Qlogic) for this error > (I have also tried 4.13-rc2, but currently I have some other problem with initiating qla2xxx there ("qla2xxx 0000:09:00.0: can't allocate MSI-X affinity masks for 2 vectors"), so I can't use FC target…) > Can you send me logs for this error and your setup information. We have fixed all the issues related to MSI-X affinity in 4.13.0-rc1. logs would really help us to identify if there are still any holes. > I am willing to provide more information if needed - It's currently my out-of-production storage and so if anyone can look at this problem I can on my side experiment and play with patches, retest, etc. > > Any help appreciated :) > > regards, > Lukasz Engel > > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, - Himanshu ��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��