Re: target crashes with vSphere 6 hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, looks like it wasn't as stable as we thought... Here is a clip
from the logs, this is the only thing other than the ABORT_TASK I
could find in the system logs.  Unfortunately I have no idea when it
stopped responding to my hosts.  My friend who was also testing this
had virtually the same results (he also gets the frequent ABORT_TASK
messages).

Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1167636
Feb 10 20:33:48 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1167636
Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1183520
Feb 10 20:34:07 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1183520
Feb 10 20:44:31 dracofiler kernel: Unknown VPD Code: 0xc9
Feb 10 20:44:33 dracofiler kernel: Unknown VPD Code: 0xc9
Feb 10 20:44:47 dracofiler kernel: Unknown VPD Code: 0xc9
Feb 10 20:46:35 dracofiler kernel: MODE SENSE: unimplemented
page/subpage: 0x1c/0x02
Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1140928
Feb 10 20:49:18 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1140928
Feb 10 20:49:19 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1209480
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_FUNCTION_COMPLETE for ref_tag: 1209480
Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr:
ffff88062b253000 buf: ffff88062b6e7c00
Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE
check condition and sense
Feb 10 20:49:29 dracofiler kernel: Detected MISCOMPARE for addr:
ffff880624bac000 buf: ffff88062b6e7c00
Feb 10 20:49:29 dracofiler kernel: Target/iblock: Send MISCOMPARE
check condition and sense
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1216828
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187260
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187348
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187392
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187436
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187480
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187524
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187304
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187568
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187656
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187744
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187788
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187832
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1187920
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052
Feb 10 20:49:29 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188096
Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1202880
Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202880
Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1202968
Feb 10 20:51:18 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1202968
Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1204244
Feb 10 20:51:37 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1204244

"systemctl status  target.service" results:
● target.service - Restore LIO kernel target configuration
   Loaded: loaded (/usr/lib/systemd/system/target.service; enabled;
vendor preset: disabled)
   Active: active (exited) since Tue 2016-02-09 22:13:33 EST; 1 day 21h ago
  Process: 1628 ExecStart=/usr/bin/targetctl restore (code=exited,
status=0/SUCCESS)
 Main PID: 1628 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/target.service

Feb 09 22:12:45 dracofiler.home.lan systemd[1]: Starting Restore LIO
kernel target configuration...
Feb 09 22:13:33 dracofiler.home.lan systemd[1]: Started Restore LIO
kernel target configuration.

On Thu, Feb 11, 2016 at 7:02 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Thu, 2016-02-11 at 18:49 -0500, Dan Lane wrote:
>> On Thu, Feb 11, 2016 at 2:48 AM, Nicholas A. Bellinger
>> <nab@xxxxxxxxxxxxxxx> wrote:
>> > Hello Dan,
>> >
>> > On Wed, 2016-02-10 at 21:30 -0500, Dan Lane wrote:
>> >> >
>> >> > SUCCESS!
>> >> >
>> >> > The latest changes have the filer working stable, I just benchmarked a Win7
>> >> > VM on an ESXi host and hit 400+MB/s without any crashing!
>> >> >
>> >
>> > Thanks for the update.
>> >
>> >> > I still see a fair number of the following errors in messages, I'm not sure
>> >> > if it's something to worry about or not, especially considering these
>> >> > numbers.
>> >> >
>> >> > Feb 10 21:18:28 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx
>> >> > task_tag: 1176876
>> >> > Feb 10 21:18:28 dracofiler kernel: ABORT_TASK: Sending
>> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1176876
>> >> > Feb 10 21:18:48 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx
>> >> > task_tag: 1199888
>> >> > Feb 10 21:18:48 dracofiler kernel: ABORT_TASK: Sending
>> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1199888
>> >> > Feb 10 21:19:07 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx
>> >> > task_tag: 1212868
>> >> > Feb 10 21:19:07 dracofiler kernel: ABORT_TASK: Sending
>> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1212868
>> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx
>> >> > task_tag: 1188008
>> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Sending
>> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188008
>> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Found referenced qla2xxx
>> >> > task_tag: 1188052
>> >> > Feb 10 21:20:19 dracofiler kernel: ABORT_TASK: Sending
>> >> > TMR_TASK_DOES_NOT_EXIST for ref_tag: 1188052
>> >> >
>> >> > Thanks again for all the help you provided.  I helped a friend with a
>> >> > similar setup get his fixed as well and he had the same results.
>> >
>> > Keep in mind your target-pending/4.4-stable branch is still missing the
>> > active I/O remote port LUN_RESET + session disconnect bug-fix currently
>> > being tested in target-pending/master here:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=0f4a943168f31d29a1701908931acaba518b131a
>> >
>> >>  Let me know if you need any further testing of the qlogic stuff.
>> >> >
>> >
>> > Let's have look your fc_host class side stats to verify FC physical
>> > layer for tcm_qla2xxx ports are working as expected.
>> >
>> >     head /sys/class/fc_host/host*/statistics/*
>> >
>>
>> Wow, lots of info... Here you go!
>> BTW, I have two QLE2462 cards in my storage box, one has two
>> connections to a single host and the other connects to two FC switches
>> in a blade chassis.
>
> So AFAICT, nothing looks out of the ordinary wrt to the stats counters.
>
> To further debug, I'd recommend looking at the stats counters on your FC
> switch and on the ESX FC host generating constant ABORT_TASKS to
> determine who is responsible for dropping packets.
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux