Re: Ubuntu 16.04 breaks FC target with ABORT_TASK regardless of kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 2, 2017 at 6:36 AM, DAVID S <shraderdm@xxxxxxxxx> wrote:
> On Mon, Jan 2, 2017 at 6:03 AM, Bart Van Assche
> <Bart.VanAssche@xxxxxxxxxxx> wrote:
>> On Tue, 2016-12-13 at 07:02 -0500, Scott L. Lykens wrote:
>>> I have been running an Ubuntu 14.04 FC target for about 15 months now with
>>> a custom compiled 4.2 kernel including a patch for PR from here.
>>>
>>> I recently decided to upgrade to Ubuntu 16.04 to keep the FC target inline
>>> with my other Ubuntu machines. This was and continues to be very painful.
>>>
>>> Neglecting other somewhat braindead problems in having a mixed 14.04 and
>>> 16.04 cluster (corosync won’t talk between them so it is very difficult to
>>> stage the upgrade and test properly), I’ve found that 16.04 with any of my
>>> custom 4.2 kernel from above, the distro’s 4.4 kernel, or the most
>>> recently compiled 4.9 from the kernel-pap all will fail with:
>>>
>>> ABORT_TASK: Found referenced qla2xxx task_tag: 1133800
>>> ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1133800
>>>
>>> repeating as long as I (don’t really) desire to let it run.
>>>
>>> This also breaks the Hyper-V cluster that is using this FC target - that is
>>> to say that it basically enters what appears to be an i/o deadlock and
>>> won’t break free unless either it is rebooted or the FC target disappears
>>> from the fabric.
>>>
>>> I suspect this is probably related to the distro itself as 16.04 fails with
>>> any kernel and 14.04 appears to work with the 4.2 and 4.4 kernels with
>>> the caveat that I did not try the 4.9 kernel with 14.04 to determine if
>>> it worked but I presume that it would.
>>>
>>> I’m hoping for some guidance as to what could also possibly have changed
>>> between 14.04 and 16.04 to cause the error above so that it can be
>>> reported to the right people to get it resolved.
>>
>> Hello Scott,
>>
>> Since there are two unknowns, it would help a lot if you could narrow down
>> this further. Can you check whether the FCoE target functionality for Ubuntu
>> 14.04 + kernel v4.9 works fine, and if not, bisect which kernel commit
>> introduced the regression?
>>
>> Thanks,
>>
>> Bart.
>
> Hi Bart/all,
>
> This is the exact same issue that Dan Lane and I attempted to help the
> mailing list troubleshoot months ago. I still see these same errors in
> my FC targets that present storage to ESXi hosts. I now use local
> storage for everything critical because the crashes caused by this are
> unpredictable. The storage disappears from any hypervisors and will
> not reappear without a reboot.
>
> In my case I was using a custom compiled Fedora 4.5 kernel with a
> patch from here as well. I don't think many people are using targetcli
> for FC targets for hypervisors, so this didn't get much attention at
> the time.
>
> If this issue gets resurrected, please let me know, and I'll be glad
> to do some testing if it will help out.
>
> Also, the reference to kernel 4.2 working makes sense here, as I
> believe the last kernel that worked properly without these errors was
> 4.3.
>
> You can reference the following link for details about prior testing:
> http://www.spinics.net/lists/target-devel/msg12919.html
>
> Thanks!
> David

All,

Sorry for spamming the mailing list, but I just wanted to give example
output of what I see when tailing my logs right now:

Jan  2 06:48:57 storage kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1211020
Jan  2 06:48:57 storage kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1211020
Jan  2 06:49:17 storage kernel: ABORT_TASK: Found referenced qla2xxx
task_tag: 1144140
Jan  2 06:49:17 storage kernel: ABORT_TASK: Sending
TMR_TASK_DOES_NOT_EXIST for ref_tag: 1144140

Also, Bart, you mentioned FCoE in your reply to Scott, but I believe
he is probably using straight FC. This issue appears to only be a
problem with hypervisors utilizing FC targets, so this seems to
further cement the original hypothesis that the issue is the way that
targetcli handles VAAI/ODX. This also seems to confirm that the issue
isn't with a specific hypervisor, since the original problems reported
by me and a few others were with ESXi, and Scott's hosts are Hyper-V.

Again, please let me know if I can assist in testing new patches, etc.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux