Re: target crashes with vSphere 6 hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for all the information Nicholas, there's only one part that I
think is really in question - the idea that the backend can't keep up
with the workload.  You may remember I had similar problems in the
past that I contacted the mailing list for.  After talking about the
problem quite a bit I decided I needed a better backend, so I built
out this new system that should have absolutely no problem keeping up.
I have 20x 10k enterprise raptor drive in RAID6 with an SSD acting as
read and write cache via LSI Cachecade.

BTW, I get the "ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for
ref_tag" error with vsphere5, but LIO doesn't crash in that case.
Also, I seem to get the error regardless of the system load.

I really appreciate the time you put into the kernel update information!

Thanks,
Dan

On Tue, Jan 26, 2016 at 1:55 AM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> Hi Dan,
>
> (Adding Quinn + Giri CC')
>
> On Mon, 2016-01-25 at 16:08 -0500, Dan Lane wrote:
>> Update: If it matters, I tried loading a host with ESXi 5.5 u3b and
>> that also crashed the filer.
>
> Thanks for your bug report.
>
> Note this bug is not specific to ESXi 6.0, and the scenario occurs when
> backend driver exports are unable to keep up with host workload,
> resulting in internal ESX ABORT_TASK + LUN_RESET to trigger across
> multiple local+remote ports.
>
> The following WIP series is for addressing this bug:
>
> http://www.spinics.net/lists/target-devel/msg11691.html
>
> I've been verifying this on iscsi-target exports over the last weeks,
> and the specific bug your hitting is AFAICT not qla2xxx driver specific.
>
>>   Still waiting on an answer about the
>> applying of upstream commits to an OS like fedora or any other ideas
>> about the cause of this.
>>
>
> So you'll want to build a v4.4 kernel until these patches are merged for
> v4.5-rc, and eventually back-ported into v4.2.y stable.
>
> Note the recent qla2xxx target fixes from v4.5-rc1 are something you'll
> want too.
>
> A v4.4 based '$ORIGIN $BRANCH' with everything is here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 4.4-stable
>
> If your not similar git cloning + building kernel source, have a look at:
>
> http://kernelnewbies.org/KernelBuild
> https://fedoraproject.org/wiki/BuildingUpstreamKernel
>
> So do a fresh 'git clone' of linux-stable.git and then:
>
>   git checkout --track -b linux-4.4.y
>
> to switch to a new local branch, and then:
>
>   git pull '$ORIGIN $BRANCH'
>
> to do a remote merge using the full target-pending.git 4.4-stable path
> from above.
>
> You can do a 'make defconfig' or use the local 4.2.y
> /boot/config-$VERSION, and build vmlinux + modules + initrd
> from there.
>
> Please let the list know your progress.
>
>> On Sun, Jan 24, 2016 at 8:11 PM, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote:
>> >> I have tried a large number of other hosts and they all act the same
>> >> way regardless of hardware.  ESXi <6 is no problem, but 6 and newer
>> >> crash the filer very quickly.
>> >
>> > You're crashing because of
>> >
>> > Jan 24 10:02:09 dracofiler kernel: kernel BUG at
>> > drivers/scsi/qla2xxx/qla_target.c:3105!
>> >
>> > which is the BUG_ON in
>> >
>> > void qlt_free_cmd(struct qla_tgt_cmd *cmd)
>> > {
>> >         struct qla_tgt_sess *sess = cmd->sess;
>> >
>> >         ql_dbg(ql_dbg_tgt, cmd->vha, 0xe074,
>> >             "%s: se_cmd[%p] ox_id %04x\n",
>> >             __func__, &cmd->se_cmd,
>> >             be16_to_cpu(cmd->atio.u.isp24.fcp_hdr.ox_id));
>> >
>> >         BUG_ON(cmd->cmd_in_wq);
>> >
>> > It seems we're freeing a command before we process it.
>> >
>> > what logging do you have from target or qla2xxx before you hit the
>> > crash?  I'm wondering why the initiator is aborting commands (although
>> > we still shouldn't crash even if it does abort commands).
>> >
>> > You could try applying upstream commit 193b50b9d54a ("qla2xxx: Replace
>> > QLA_TGT_STATE_ABORTED with a bit.") which seems like it might be
>> > related, though I'm not sure whether it really will help.
>> >
>> >  - R.
>> --
>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux