Re: SCSI qla2xxx: tcm_qla2xxx target server code regession

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2018-10-17 at 16:42 -0400, Laurence Oberman wrote:
> On Mon, 2018-10-15 at 08:51 -0400, Laurence Oberman wrote:
> > On Sat, 2018-10-13 at 10:42 -0400, Laurence Oberman wrote:
> > > On Fri, 2018-10-12 at 17:51 -0700, Bart Van Assche wrote:
> > > > On 10/12/18 1:36 PM, Laurence Oberman wrote:
> > > >  > While I have for the longest time used 4.5 as a base for my
> > > > F/C
> > > > jammer
> > > >  > that I use every day here in our lab I recently added more
> > > > jammer
> > > > code
> > > >  > so I decided to test this all on latest upstream.
> > > >  >
> > > >  > Booting the target server on my 4.5 kernel with jammer code
> > > > is
> > > >  > flawless and serves LUNS with no issues and handles the
> > > > jamming
> > > > also
> > > >  > fine.
> > > >  >
> > > >  > However just building a 4.19.0_rc7+-1 (I left the jammer
> > > > stuff
> > > > out)
> > > >  > its pretty broken.
> > > > 
> > > > A large number of patches went upstream between these two
> > > > kernel 
> > > > versions for both the QLogic initiator and target drivers. From
> > > > the
> > > > logs 
> > > > it seems like you were using QLogic hardware at both the
> > > > initiator
> > > > and 
> > > > target side? If so, which kernel version was running at the
> > > > initiator 
> > > > side during these tests? 4.5, 4.19-rc7+ or yet another version?
> > > > 
> > > > Thanks,
> > > > 
> > > > Bart.
> > > > 
> > > > 
> > > 
> > > I had only replied to Bart, this was my reply, reply all now
> > > 
> > > Hi Bart
> > > Thank you for always being helpful.
> > > 
> > > I am using at the moment RHEL 7.5 for the initiator (based on
> > > kernel
> > > 3.10 but of course lots of backports)
> > > The exact same initiator is working fine with the 4.5 and I would
> > > not
> > > expect the target to  require the same kernel level.
> > > Of course I will try latest upstream on the initiator later and
> > > reply
> > > back.
> > > 
> > > I was thinking the target should adhere to the standards and
> > > support
> > > many types of kernels within reason for the initiator.
> > > 
> > > Thanks
> > 
> > Changed the Subject to match regression
> > 
> > I tested the following:
> > Target 
> > upstream 4.19_rc4
> > 
> > Inititiator
> > qla2xxx RHEL7.5 
> > lpfc RHEL7.5
> > qla2xxx Upstream 4.19_rc4 matching initiatior
> > 
> > All 3 are unstable and fail, with tag errors and aborts
> > 
> > I am not sure when the issue started so will work on a bisect
> > 
> > Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sdj,
> > sector 128
> > Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sds,
> > sector 16
> > ..
> > ..
> > Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 FAILED
> > Result:
> > hostbyte=DID_ERROR driverbyte=DRIVER_OK
> > Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 CDB: Read(10)
> > 28
> > 00 00 00 03 00 00 01 00 00
> > Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 FAILED
> > Result:
> > hostbyte=DID_ERROR driverbyte=DRIVER_OK
> > Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 CDB: Read(10)
> > 28
> > 00 00 00 00 40 00 00 38 00
> > Oct 15 07:30:17 ml150 kernel: sd 5:0:1:0: [sdp] tag#0 FAILED
> > Result: 
> 
> Following up here
> 
> I started with 4.9 and it was stable
> 
> I then tested 4.10 and its immediately broken and logs these when
> restoring the target configuration.
> This is a different failure to the upstream latest which takes the
> configuration but fails the I/O servicing.
> 
> This is logged on the target when restoring the target configuration
> for 4.10
> 
> Perhaps in this case its a target config mismatch. I did not manually
> add targets on 4.10 but I can certainly try it.
> Note that 4.18 for example takes the config with no complaints so I
> doubt its a mismatch.
> 
> [   95.668210] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
> handle/queue (1c01).
> [   95.706773] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
> handle/queue (0002).
> [   95.745912] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
> handle/queue (5838).
> 
> This comes from
> 
> /**
>  * qla2x00_error_entry() - Process an error entry.
>  * @ha: SCSI driver HA context
>  * @pkt: Entry pointer
>  */
> static void
> qla2x00_error_entry(scsi_qla_host_t *vha, struct rsp_que *rsp,
> sts_entry_t *pkt)
> {
>         srb_t *sp;
>         struct qla_hw_data *ha = vha->hw;
>         const char func[] = "ERROR-IOCB";
>         uint16_t que = MSW(pkt->handle);
>         struct req_que *req = NULL;
>         int res = DID_ERROR << 16;
> 
>         ql_dbg(ql_dbg_async, vha, 0x502a,
>             "type of error status in response: 0x%x\n", pkt-
> > entry_status);
> 
>         if (que >= ha->max_req_queues || !ha->req_q_map[que])
>                 goto fatal;
> 
>         req = ha->req_q_map[que];
> 
>         if (pkt->entry_status & RF_BUSY)
>                 res = DID_BUS_BUSY << 16;
> 
>         if (pkt->entry_type == NOTIFY_ACK_TYPE &&
>             pkt->handle == QLA_TGT_SKIP_HANDLE)
>                 return;
> 
>         sp = qla2x00_get_sp_from_handle(vha, func, req, pkt);
>         if (sp) {
>                 sp->done(ha, sp, res);
>                 return;
>         }
> fatal:
>         ql_log(ql_log_warn, vha, 0x5030,
>             "Error entry - invalid handle/queue (%04x).\n", que);
> }
> 
> 
> I started the bisect which was a challenge because of the MSIx probe
> failure bug (commit 17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity)
> but in the end I got down to this commit
> 
> commit d74595278f4ab192af66d9e60a9087464638beee
> Author: Michael Hernandez <michael.hernandez@xxxxxxxxxx>
> Date:   Mon Dec 12 14:40:07 2016 -0800
> 
>     scsi: qla2xxx: Add multiple queue pair functionality.
> 
> Its not possible to do a  simple revert here
> 
> This is puzzling me, as I wonder why I seem to be the only one seeing
> this.
> 
> The qla2xxx changes between 4.9 and 4.10 are these
> 
> [loberman@ml150 linux]$ git log --oneline v4.9..v4.10
> drivers/scsi/qla2xxx
> 13ebfd0 Merge tag 'scsi-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> ad3efdb Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into
> fixes
> 2780f3c scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel
> crash
> 27873de scsi: qla2xxx: Fix a recently introduced memory leak
> 5116226 Merge branch 'scsi-target-for-v4.10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
> 300af14 qla2xxx: Disable out-of-order processing by default in
> firmware
> 4f06073 qla2xxx: Fix erroneous invalid handle message
> 200ffb1 qla2xxx: Reduce exess wait during chip reset
> 5f35509 qla2xxx: Terminate exchange if corrupted
> fc1ffd6 qla2xxx: Fix crash due to null pointer access
> 8d3c9c2 qla2xxx: Collect additional information to debug fw dump
> c0f6462 qla2xxx: Reset reserved field in firmware options to 0
> 2a47c68 qla2xxx: Set tcm_qla2xxx version to automatically track
> qla2xxx
> version
> 1cbb915 qla2xxx: Include ATIO queue in firmware dump when in target
> mode
> bb1181c qla2xxx: Fix wrong IOCB type assumption
> 91f42b3 qla2xxx: Avoid that building with W=1 triggers complaints
> about
> set-but-not-used variables
> 61778a1 qla2xxx: Move two arrays from header files to .c files
> ca82582 qla2xxx: Declare an array with file scope static
> c2a5d94 qla2xxx: Fix indentation
> 2f5a3145 Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into
> fixes
> 98624c4 scsi: qla2xxx: remove irq_affinity_notifier
> 17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity
> c3c4239 scsi: qla2xxx: Fix apparent cut-n-paste error.
> c7702b8 scsi: qla2xxx: Get mutex lock before checking optrom_state
> 7c0f6ba Replace <asm/uaccess.h> with <linux/uaccess.h> globally
> f290cba Merge tag 'scsi-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> d5db84a8 Merge branch 'scsi-target-for-v4.10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
> 093df73 scsi: qla2xxx: Fix Target mode handling with Multiqueue
> changes.
> 5601236 scsi: qla2xxx: Add Block Multi Queue functionality.
> d745952 scsi: qla2xxx: Add multiple queue pair functionality.
> 4fa1834 scsi: qla2xxx: Utilize
> pci_alloc_irq_vectors/pci_free_irq_vectors calls.
> 77ddb94 scsi: qla2xxx: Only allow operational MBX to proceed during
> RESET.
> a829a84 Merge tag 'scsi-misc' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> 09ce66a qla2xxx: Add an #include directive
> 0654816 scsi: fc: use bsg_job_done
> 75cc8cf scsi: change FC drivers to use 'struct bsg_job'
> 1d69b12 scsi: fc: provide fc_bsg_to_rport() helper
> cd21c60 scsi: fc: provide fc_bsg_to_shost() helper
> 1abaede scsi: fc: Export fc_bsg_jobdone and use it in FC drivers
> 01e0e15 scsi: don't use fc_bsg_job::request and fc_bsg_job::reply
> directly
> 
> We may have multiple issues at play here. i.e. one between 4.9 and
> 4.10
> and the actual I./O tag failure one on later kernels.
> 
> I will try 4.15 now as for some reason I seem to remember that
> somewhat
> working here. Its been a while though.
> 
> The next puzzle is Himanshu says its been working for him but I am
> testing actual SAN connectivity between a physically separate
> initiator
>  and target.
> 
> Thanks
> Laurence

Hello Himanshu

Still busy here

The last good test for me was 4.15 target and 4.15 initiator. 
4.17 and higher are broken as well as the weird issue that broke
between 4.9 and 4.10.

If 4.16 is broken as well my bisect will be between 4.15 and 4.16, and
will keep you updated.

Slow going with other work I need to get done.


Regards
Laurence




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux