Re: SCSI qla2xxx: tcm_qla2xxx target server code regession

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2018-10-15 at 08:51 -0400, Laurence Oberman wrote:
> On Sat, 2018-10-13 at 10:42 -0400, Laurence Oberman wrote:
> > On Fri, 2018-10-12 at 17:51 -0700, Bart Van Assche wrote:
> > > On 10/12/18 1:36 PM, Laurence Oberman wrote:
> > >  > While I have for the longest time used 4.5 as a base for my
> > > F/C
> > > jammer
> > >  > that I use every day here in our lab I recently added more
> > > jammer
> > > code
> > >  > so I decided to test this all on latest upstream.
> > >  >
> > >  > Booting the target server on my 4.5 kernel with jammer code is
> > >  > flawless and serves LUNS with no issues and handles the
> > > jamming
> > > also
> > >  > fine.
> > >  >
> > >  > However just building a 4.19.0_rc7+-1 (I left the jammer stuff
> > > out)
> > >  > its pretty broken.
> > > 
> > > A large number of patches went upstream between these two kernel 
> > > versions for both the QLogic initiator and target drivers. From
> > > the
> > > logs 
> > > it seems like you were using QLogic hardware at both the
> > > initiator
> > > and 
> > > target side? If so, which kernel version was running at the
> > > initiator 
> > > side during these tests? 4.5, 4.19-rc7+ or yet another version?
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > 
> > > 
> > 
> > I had only replied to Bart, this was my reply, reply all now
> > 
> > Hi Bart
> > Thank you for always being helpful.
> > 
> > I am using at the moment RHEL 7.5 for the initiator (based on
> > kernel
> > 3.10 but of course lots of backports)
> > The exact same initiator is working fine with the 4.5 and I would
> > not
> > expect the target to  require the same kernel level.
> > Of course I will try latest upstream on the initiator later and
> > reply
> > back.
> > 
> > I was thinking the target should adhere to the standards and
> > support
> > many types of kernels within reason for the initiator.
> > 
> > Thanks
> 
> Changed the Subject to match regression
> 
> I tested the following:
> Target 
> upstream 4.19_rc4
> 
> Inititiator
> qla2xxx RHEL7.5 
> lpfc RHEL7.5
> qla2xxx Upstream 4.19_rc4 matching initiatior
> 
> All 3 are unstable and fail, with tag errors and aborts
> 
> I am not sure when the issue started so will work on a bisect
> 
> Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sdj,
> sector 128
> Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sds,
> sector 16
> ..
> ..
> Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 FAILED Result:
> hostbyte=DID_ERROR driverbyte=DRIVER_OK
> Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 CDB: Read(10)
> 28
> 00 00 00 03 00 00 01 00 00
> Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 FAILED Result:
> hostbyte=DID_ERROR driverbyte=DRIVER_OK
> Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 CDB: Read(10)
> 28
> 00 00 00 00 40 00 00 38 00
> Oct 15 07:30:17 ml150 kernel: sd 5:0:1:0: [sdp] tag#0 FAILED Result: 

Following up here

I started with 4.9 and it was stable

I then tested 4.10 and its immediately broken and logs these when
restoring the target configuration.
This is a different failure to the upstream latest which takes the
configuration but fails the I/O servicing.

This is logged on the target when restoring the target configuration
for 4.10

Perhaps in this case its a target config mismatch. I did not manually
add targets on 4.10 but I can certainly try it.
Note that 4.18 for example takes the config with no complaints so I
doubt its a mismatch.

[   95.668210] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
handle/queue (1c01).
[   95.706773] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
handle/queue (0002).
[   95.745912] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid
handle/queue (5838).

This comes from

/**
 * qla2x00_error_entry() - Process an error entry.
 * @ha: SCSI driver HA context
 * @pkt: Entry pointer
 */
static void
qla2x00_error_entry(scsi_qla_host_t *vha, struct rsp_que *rsp,
sts_entry_t *pkt)
{
        srb_t *sp;
        struct qla_hw_data *ha = vha->hw;
        const char func[] = "ERROR-IOCB";
        uint16_t que = MSW(pkt->handle);
        struct req_que *req = NULL;
        int res = DID_ERROR << 16;

        ql_dbg(ql_dbg_async, vha, 0x502a,
            "type of error status in response: 0x%x\n", pkt-
>entry_status);

        if (que >= ha->max_req_queues || !ha->req_q_map[que])
                goto fatal;

        req = ha->req_q_map[que];

        if (pkt->entry_status & RF_BUSY)
                res = DID_BUS_BUSY << 16;

        if (pkt->entry_type == NOTIFY_ACK_TYPE &&
            pkt->handle == QLA_TGT_SKIP_HANDLE)
                return;

        sp = qla2x00_get_sp_from_handle(vha, func, req, pkt);
        if (sp) {
                sp->done(ha, sp, res);
                return;
        }
fatal:
        ql_log(ql_log_warn, vha, 0x5030,
            "Error entry - invalid handle/queue (%04x).\n", que);
}


I started the bisect which was a challenge because of the MSIx probe
failure bug (commit 17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity)
but in the end I got down to this commit

commit d74595278f4ab192af66d9e60a9087464638beee
Author: Michael Hernandez <michael.hernandez@xxxxxxxxxx>
Date:   Mon Dec 12 14:40:07 2016 -0800

    scsi: qla2xxx: Add multiple queue pair functionality.

Its not possible to do a  simple revert here

This is puzzling me, as I wonder why I seem to be the only one seeing
this.

The qla2xxx changes between 4.9 and 4.10 are these

[loberman@ml150 linux]$ git log --oneline v4.9..v4.10
drivers/scsi/qla2xxx
13ebfd0 Merge tag 'scsi-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
ad3efdb Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into
fixes
2780f3c scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel crash
27873de scsi: qla2xxx: Fix a recently introduced memory leak
5116226 Merge branch 'scsi-target-for-v4.10' of
git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
300af14 qla2xxx: Disable out-of-order processing by default in firmware
4f06073 qla2xxx: Fix erroneous invalid handle message
200ffb1 qla2xxx: Reduce exess wait during chip reset
5f35509 qla2xxx: Terminate exchange if corrupted
fc1ffd6 qla2xxx: Fix crash due to null pointer access
8d3c9c2 qla2xxx: Collect additional information to debug fw dump
c0f6462 qla2xxx: Reset reserved field in firmware options to 0
2a47c68 qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx
version
1cbb915 qla2xxx: Include ATIO queue in firmware dump when in target
mode
bb1181c qla2xxx: Fix wrong IOCB type assumption
91f42b3 qla2xxx: Avoid that building with W=1 triggers complaints about
set-but-not-used variables
61778a1 qla2xxx: Move two arrays from header files to .c files
ca82582 qla2xxx: Declare an array with file scope static
c2a5d94 qla2xxx: Fix indentation
2f5a3145 Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into
fixes
98624c4 scsi: qla2xxx: remove irq_affinity_notifier
17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity
c3c4239 scsi: qla2xxx: Fix apparent cut-n-paste error.
c7702b8 scsi: qla2xxx: Get mutex lock before checking optrom_state
7c0f6ba Replace <asm/uaccess.h> with <linux/uaccess.h> globally
f290cba Merge tag 'scsi-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
d5db84a8 Merge branch 'scsi-target-for-v4.10' of
git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
093df73 scsi: qla2xxx: Fix Target mode handling with Multiqueue
changes.
5601236 scsi: qla2xxx: Add Block Multi Queue functionality.
d745952 scsi: qla2xxx: Add multiple queue pair functionality.
4fa1834 scsi: qla2xxx: Utilize
pci_alloc_irq_vectors/pci_free_irq_vectors calls.
77ddb94 scsi: qla2xxx: Only allow operational MBX to proceed during
RESET.
a829a84 Merge tag 'scsi-misc' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
09ce66a qla2xxx: Add an #include directive
0654816 scsi: fc: use bsg_job_done
75cc8cf scsi: change FC drivers to use 'struct bsg_job'
1d69b12 scsi: fc: provide fc_bsg_to_rport() helper
cd21c60 scsi: fc: provide fc_bsg_to_shost() helper
1abaede scsi: fc: Export fc_bsg_jobdone and use it in FC drivers
01e0e15 scsi: don't use fc_bsg_job::request and fc_bsg_job::reply
directly

We may have multiple issues at play here. i.e. one between 4.9 and 4.10
and the actual I./O tag failure one on later kernels.

I will try 4.15 now as for some reason I seem to remember that somewhat
working here. Its been a while though.

The next puzzle is Himanshu says its been working for him but I am
testing actual SAN connectivity between a physically separate initiator
 and target.

Thanks
Laurence



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux