Re: target crashes with vSphere 6 hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Roland,
     Thanks for the reply, I've actually always been a bit confused
about the whole idea of applying upstream commits, target is part of
the kernel, right?  In this case since I'm using Fedora, how would I
do that? Would I download the Fedora kernel source code, patch, and
compile my own kernel?

Here are all the logs I have leading up to the crash:

Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000f6
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000f7
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000f8
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000f9
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000fa
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000fb
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000fc
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000fd
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000fe
Jan 24 09:57:19 dracofiler kernel: TARGET_CORE[qla2xxx]: Detected
NON_EXISTENT_LUN Access for 0x000000ff
Jan 24 10:00:29 dracofiler kernel: MODE SENSE: unimplemented
page/subpage: 0x1c/0x02
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1144976
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: ref_tag: 1144976
already complete, skipping
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1144976
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1145020
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: ref_tag: 1145020
already complete, skipping
Jan 24 10:01:21 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1145020
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1146120
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: ref_tag: 1146120
already complete, skipping
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1146120
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1146076
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: ref_tag: 1146076
already complete, skipping
Jan 24 10:01:41 dracofiler kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1146076
Jan 24 10:01:48 dracofiler kernel: Detected MISCOMPARE for addr:
ffff880616856000 buf: ffff880626af5000
Jan 24 10:01:48 dracofiler kernel: Target/iblock: Send MISCOMPARE
check condition and sense
Jan 24 10:01:48 dracofiler kernel: Detected MISCOMPARE for addr:
ffff880629bfc000 buf: ffff880626af5e00
Jan 24 10:01:48 dracofiler kernel: Target/iblock: Send MISCOMPARE
check condition and sense
Jan 24 10:01:56 dracofiler kernel: ABORT_TASK: Found referenced
qla2xxx task_tag: 1196456

Thanks,
Dan

On Sun, Jan 24, 2016 at 8:11 PM, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote:
>> I have tried a large number of other hosts and they all act the same
>> way regardless of hardware.  ESXi <6 is no problem, but 6 and newer
>> crash the filer very quickly.
>
> You're crashing because of
>
> Jan 24 10:02:09 dracofiler kernel: kernel BUG at
> drivers/scsi/qla2xxx/qla_target.c:3105!
>
> which is the BUG_ON in
>
> void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> {
>         struct qla_tgt_sess *sess = cmd->sess;
>
>         ql_dbg(ql_dbg_tgt, cmd->vha, 0xe074,
>             "%s: se_cmd[%p] ox_id %04x\n",
>             __func__, &cmd->se_cmd,
>             be16_to_cpu(cmd->atio.u.isp24.fcp_hdr.ox_id));
>
>         BUG_ON(cmd->cmd_in_wq);
>
> It seems we're freeing a command before we process it.
>
> what logging do you have from target or qla2xxx before you hit the
> crash?  I'm wondering why the initiator is aborting commands (although
> we still shouldn't crash even if it does abort commands).
>
> You could try applying upstream commit 193b50b9d54a ("qla2xxx: Replace
> QLA_TGT_STATE_ABORTED with a bit.") which seems like it might be
> related, though I'm not sure whether it really will help.
>
>  - R.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux