Thanks for all the information Nicholas, there's only one part that I think is really in question - the idea that the backend can't keep up with the workload. You may remember I had similar problems in the past that I contacted the mailing list for. After talking about the problem quite a bit I decided I needed a better backend, so I built out this new system that should have absolutely no problem keeping up. I have 20x 10k enterprise raptor drive in RAID6 with an SSD acting as read and write cache via LSI Cachecade. BTW, I get the "ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag" error with vsphere5, but LIO doesn't crash in that case. Also, I seem to get the error regardless of the system load. I really appreciate the time you put into the kernel update information! Thanks, Dan On Tue, Jan 26, 2016 at 1:55 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > Hi Dan, > > (Adding Quinn + Giri CC') > > On Mon, 2016-01-25 at 16:08 -0500, Dan Lane wrote: >> Update: If it matters, I tried loading a host with ESXi 5.5 u3b and >> that also crashed the filer. > > Thanks for your bug report. > > Note this bug is not specific to ESXi 6.0, and the scenario occurs when > backend driver exports are unable to keep up with host workload, > resulting in internal ESX ABORT_TASK + LUN_RESET to trigger across > multiple local+remote ports. > > The following WIP series is for addressing this bug: > > http://www.spinics.net/lists/target-devel/msg11691.html > > I've been verifying this on iscsi-target exports over the last weeks, > and the specific bug your hitting is AFAICT not qla2xxx driver specific. > >> Still waiting on an answer about the >> applying of upstream commits to an OS like fedora or any other ideas >> about the cause of this. >> > > So you'll want to build a v4.4 kernel until these patches are merged for > v4.5-rc, and eventually back-ported into v4.2.y stable. > > Note the recent qla2xxx target fixes from v4.5-rc1 are something you'll > want too. > > A v4.4 based '$ORIGIN $BRANCH' with everything is here: > > git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 4.4-stable > > If your not similar git cloning + building kernel source, have a look at: > > http://kernelnewbies.org/KernelBuild > https://fedoraproject.org/wiki/BuildingUpstreamKernel > > So do a fresh 'git clone' of linux-stable.git and then: > > git checkout --track -b linux-4.4.y > > to switch to a new local branch, and then: > > git pull '$ORIGIN $BRANCH' > > to do a remote merge using the full target-pending.git 4.4-stable path > from above. > > You can do a 'make defconfig' or use the local 4.2.y > /boot/config-$VERSION, and build vmlinux + modules + initrd > from there. > > Please let the list know your progress. > >> On Sun, Jan 24, 2016 at 8:11 PM, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote: >> >> I have tried a large number of other hosts and they all act the same >> >> way regardless of hardware. ESXi <6 is no problem, but 6 and newer >> >> crash the filer very quickly. >> > >> > You're crashing because of >> > >> > Jan 24 10:02:09 dracofiler kernel: kernel BUG at >> > drivers/scsi/qla2xxx/qla_target.c:3105! >> > >> > which is the BUG_ON in >> > >> > void qlt_free_cmd(struct qla_tgt_cmd *cmd) >> > { >> > struct qla_tgt_sess *sess = cmd->sess; >> > >> > ql_dbg(ql_dbg_tgt, cmd->vha, 0xe074, >> > "%s: se_cmd[%p] ox_id %04x\n", >> > __func__, &cmd->se_cmd, >> > be16_to_cpu(cmd->atio.u.isp24.fcp_hdr.ox_id)); >> > >> > BUG_ON(cmd->cmd_in_wq); >> > >> > It seems we're freeing a command before we process it. >> > >> > what logging do you have from target or qla2xxx before you hit the >> > crash? I'm wondering why the initiator is aborting commands (although >> > we still shouldn't crash even if it does abort commands). >> > >> > You could try applying upstream commit 193b50b9d54a ("qla2xxx: Replace >> > QLA_TGT_STATE_ABORTED with a bit.") which seems like it might be >> > related, though I'm not sure whether it really will help. >> > >> > - R. >> -- >> To unsubscribe from this list: send the line "unsubscribe target-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html