Well, looks like it wasn't as stable as we thought... Here is a clip from the logs, this is the only thing other than the ABORT_TASK I could find in the system logs. Unfortunately I have no idea when it stopped responding to my hosts. My friend who was also testing this had virtually the same results (he also gets the frequent ABORT_TASK messages). Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1167636 Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1167636 Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1183520 Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1183520 Feb 10 20:44:31 dracofiler kernel: Unknown VPD Code: 0xc9 Feb 10 20:44:33 dracofiler kernel: Unknown VPD Code: 0xc9 Feb 10 20:44:47 dracofiler kernel: Unknown VPD Code: 0xc9 Feb 10 20:46:35 dracofiler kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1140928 Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1140928 Feb 10 20:49:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1209480 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1209480 Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff88062b253000 buf: ffff88062b6e7c00 Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr: ffff880624bac000 buf: ffff88062b6e7c00 Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE check condition and sense Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1216828 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187260 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187348 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187392 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187436 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187480 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187524 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187304 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187568 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187656 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187744 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187788 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187832 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187920 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052 Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188096 Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202880 Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202880 Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1202968 Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202968 Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1204244 Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1204244 "systemctl status target.service" results: ● target.service - Restore LIO kernel target configuration Loaded: loaded (/usr/lib/systemd/system/target.service; enabled; vendor preset: disabled) Active: active (exited) since Tue 2016-02-09 22:13:33 EST; 1 day 21h ago Process: 1628 ExecStart=/usr/bin/targetctl restore (code=exited, status=0/SUCCESS) Main PID: 1628 (code=exited, status=0/SUCCESS) CGroup: /system.slice/target.service Feb 09 22:12:45 dracofiler.home.lan systemd[1]: Starting Restore LIO kernel target configuration... Feb 09 22:13:33 dracofiler.home.lan systemd[1]: Started Restore LIO kernel target configuration. On Thu, Feb 11, 2016 at 7:02 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Thu, 2016-02-11 at 18:49 -0500, Dan Lane wrote: >> On Thu, Feb 11, 2016 at 2:48 AM, Nicholas A. Bellinger >> <nab@xxxxxxxxxxxxxxx> wrote: >> > Hello Dan, >> > >> > On Wed, 2016-02-10 at 21:30 -0500, Dan Lane wrote: >> >> > >> >> > SUCCESS! >> >> > >> >> > The latest changes have the filer working stable, I just benchmarked a Win7 >> >> > VM on an ESXi host and hit 400+MB/s without any crashing! >> >> > >> > >> > Thanks for the update. >> > >> >> > I still see a fair number of the following errors in messages, I'm not sure >> >> > if it's something to worry about or not, especially considering these >> >> > numbers. >> >> > >> >> > Feb 10 21:18:28 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx >> >> > task_tag: 1176876 >> >> > Feb 10 21:18:28 dracofiler kernel: ABORT_TASK: Sending >> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1176876 >> >> > Feb 10 21:18:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx >> >> > task_tag: 1199888 >> >> > Feb 10 21:18:48 dracofiler kernel: ABORT_TASK: Sending >> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1199888 >> >> > Feb 10 21:19:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx >> >> > task_tag: 1212868 >> >> > Feb 10 21:19:07 dracofiler kernel: ABORT_TASK: Sending >> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1212868 >> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx >> >> > task_tag: 1188008 >> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Sending >> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008 >> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx >> >> > task_tag: 1188052 >> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Sending >> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052 >> >> > >> >> > Thanks again for all the help you provided. I helped a friend with a >> >> > similar setup get his fixed as well and he had the same results. >> > >> > Keep in mind your target-pending/4.4-stable branch is still missing the >> > active I/O remote port LUN_RESET + session disconnect bug-fix currently >> > being tested in target-pending/master here: >> > >> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=0f4a943168f31d29a1701908931acaba518b131a >> > >> >> Let me know if you need any further testing of the qlogic stuff. >> >> > >> > >> > Let's have look your fc_host class side stats to verify FC physical >> > layer for tcm_qla2xxx ports are working as expected. >> > >> > head /sys/class/fc_host/host*/statistics/* >> > >> >> Wow, lots of info... Here you go! >> BTW, I have two QLE2462 cards in my storage box, one has two >> connections to a single host and the other connects to two FC switches >> in a blade chassis. > > So AFAICT, nothing looks out of the ordinary wrt to the stats counters. > > To further debug, I'd recommend looking at the stats counters on your FC > switch and on the ESX FC host generating constant ABORT_TASKS to > determine who is responsible for dropping packets. > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html