On Mon, Jan 2, 2017 at 6:36 AM, DAVID S <shraderdm@xxxxxxxxx> wrote: > On Mon, Jan 2, 2017 at 6:03 AM, Bart Van Assche > <Bart.VanAssche@xxxxxxxxxxx> wrote: >> On Tue, 2016-12-13 at 07:02 -0500, Scott L. Lykens wrote: >>> I have been running an Ubuntu 14.04 FC target for about 15 months now with >>> a custom compiled 4.2 kernel including a patch for PR from here. >>> >>> I recently decided to upgrade to Ubuntu 16.04 to keep the FC target inline >>> with my other Ubuntu machines. This was and continues to be very painful. >>> >>> Neglecting other somewhat braindead problems in having a mixed 14.04 and >>> 16.04 cluster (corosync won’t talk between them so it is very difficult to >>> stage the upgrade and test properly), I’ve found that 16.04 with any of my >>> custom 4.2 kernel from above, the distro’s 4.4 kernel, or the most >>> recently compiled 4.9 from the kernel-pap all will fail with: >>> >>> ABORT_TASK: Found referenced qla2xxx task_tag: 1133800 >>> ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1133800 >>> >>> repeating as long as I (don’t really) desire to let it run. >>> >>> This also breaks the Hyper-V cluster that is using this FC target - that is >>> to say that it basically enters what appears to be an i/o deadlock and >>> won’t break free unless either it is rebooted or the FC target disappears >>> from the fabric. >>> >>> I suspect this is probably related to the distro itself as 16.04 fails with >>> any kernel and 14.04 appears to work with the 4.2 and 4.4 kernels with >>> the caveat that I did not try the 4.9 kernel with 14.04 to determine if >>> it worked but I presume that it would. >>> >>> I’m hoping for some guidance as to what could also possibly have changed >>> between 14.04 and 16.04 to cause the error above so that it can be >>> reported to the right people to get it resolved. >> >> Hello Scott, >> >> Since there are two unknowns, it would help a lot if you could narrow down >> this further. Can you check whether the FCoE target functionality for Ubuntu >> 14.04 + kernel v4.9 works fine, and if not, bisect which kernel commit >> introduced the regression? >> >> Thanks, >> >> Bart. > > Hi Bart/all, > > This is the exact same issue that Dan Lane and I attempted to help the > mailing list troubleshoot months ago. I still see these same errors in > my FC targets that present storage to ESXi hosts. I now use local > storage for everything critical because the crashes caused by this are > unpredictable. The storage disappears from any hypervisors and will > not reappear without a reboot. > > In my case I was using a custom compiled Fedora 4.5 kernel with a > patch from here as well. I don't think many people are using targetcli > for FC targets for hypervisors, so this didn't get much attention at > the time. > > If this issue gets resurrected, please let me know, and I'll be glad > to do some testing if it will help out. > > Also, the reference to kernel 4.2 working makes sense here, as I > believe the last kernel that worked properly without these errors was > 4.3. > > You can reference the following link for details about prior testing: > http://www.spinics.net/lists/target-devel/msg12919.html > > Thanks! > David All, Sorry for spamming the mailing list, but I just wanted to give example output of what I see when tailing my logs right now: Jan 2 06:48:57 storage kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1211020 Jan 2 06:48:57 storage kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1211020 Jan 2 06:49:17 storage kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1144140 Jan 2 06:49:17 storage kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1144140 Also, Bart, you mentioned FCoE in your reply to Scott, but I believe he is probably using straight FC. This issue appears to only be a problem with hypervisors utilizing FC targets, so this seems to further cement the original hypothesis that the issue is the way that targetcli handles VAAI/ODX. This also seems to confirm that the issue isn't with a specific hypervisor, since the original problems reported by me and a few others were with ESXi, and Scott's hosts are Hyper-V. Again, please let me know if I can assist in testing new patches, etc. Thanks, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html