Re: target crashes with vSphere 6 hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

(Adding Quinn + Giri CC')

On Mon, 2016-01-25 at 16:08 -0500, Dan Lane wrote:
> Update: If it matters, I tried loading a host with ESXi 5.5 u3b and
> that also crashed the filer.

Thanks for your bug report.

Note this bug is not specific to ESXi 6.0, and the scenario occurs when
backend driver exports are unable to keep up with host workload,
resulting in internal ESX ABORT_TASK + LUN_RESET to trigger across
multiple local+remote ports.

The following WIP series is for addressing this bug:

http://www.spinics.net/lists/target-devel/msg11691.html

I've been verifying this on iscsi-target exports over the last weeks,
and the specific bug your hitting is AFAICT not qla2xxx driver specific.

>   Still waiting on an answer about the
> applying of upstream commits to an OS like fedora or any other ideas
> about the cause of this.
> 

So you'll want to build a v4.4 kernel until these patches are merged for
v4.5-rc, and eventually back-ported into v4.2.y stable.

Note the recent qla2xxx target fixes from v4.5-rc1 are something you'll
want too.

A v4.4 based '$ORIGIN $BRANCH' with everything is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 4.4-stable

If your not similar git cloning + building kernel source, have a look at:

http://kernelnewbies.org/KernelBuild
https://fedoraproject.org/wiki/BuildingUpstreamKernel

So do a fresh 'git clone' of linux-stable.git and then:

  git checkout --track -b linux-4.4.y

to switch to a new local branch, and then:

  git pull '$ORIGIN $BRANCH'

to do a remote merge using the full target-pending.git 4.4-stable path
from above.

You can do a 'make defconfig' or use the local 4.2.y
/boot/config-$VERSION, and build vmlinux + modules + initrd
from there.

Please let the list know your progress.

> On Sun, Jan 24, 2016 at 8:11 PM, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote:
> >> I have tried a large number of other hosts and they all act the same
> >> way regardless of hardware.  ESXi <6 is no problem, but 6 and newer
> >> crash the filer very quickly.
> >
> > You're crashing because of
> >
> > Jan 24 10:02:09 dracofiler kernel: kernel BUG at
> > drivers/scsi/qla2xxx/qla_target.c:3105!
> >
> > which is the BUG_ON in
> >
> > void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > {
> >         struct qla_tgt_sess *sess = cmd->sess;
> >
> >         ql_dbg(ql_dbg_tgt, cmd->vha, 0xe074,
> >             "%s: se_cmd[%p] ox_id %04x\n",
> >             __func__, &cmd->se_cmd,
> >             be16_to_cpu(cmd->atio.u.isp24.fcp_hdr.ox_id));
> >
> >         BUG_ON(cmd->cmd_in_wq);
> >
> > It seems we're freeing a command before we process it.
> >
> > what logging do you have from target or qla2xxx before you hit the
> > crash?  I'm wondering why the initiator is aborting commands (although
> > we still shouldn't crash even if it does abort commands).
> >
> > You could try applying upstream commit 193b50b9d54a ("qla2xxx: Replace
> > QLA_TGT_STATE_ABORTED with a bit.") which seems like it might be
> > related, though I'm not sure whether it really will help.
> >
> >  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux