On Mon, 2018-10-15 at 08:51 -0400, Laurence Oberman wrote: > On Sat, 2018-10-13 at 10:42 -0400, Laurence Oberman wrote: > > On Fri, 2018-10-12 at 17:51 -0700, Bart Van Assche wrote: > > > On 10/12/18 1:36 PM, Laurence Oberman wrote: > > > > While I have for the longest time used 4.5 as a base for my > > > F/C > > > jammer > > > > that I use every day here in our lab I recently added more > > > jammer > > > code > > > > so I decided to test this all on latest upstream. > > > > > > > > Booting the target server on my 4.5 kernel with jammer code is > > > > flawless and serves LUNS with no issues and handles the > > > jamming > > > also > > > > fine. > > > > > > > > However just building a 4.19.0_rc7+-1 (I left the jammer stuff > > > out) > > > > its pretty broken. > > > > > > A large number of patches went upstream between these two kernel > > > versions for both the QLogic initiator and target drivers. From > > > the > > > logs > > > it seems like you were using QLogic hardware at both the > > > initiator > > > and > > > target side? If so, which kernel version was running at the > > > initiator > > > side during these tests? 4.5, 4.19-rc7+ or yet another version? > > > > > > Thanks, > > > > > > Bart. > > > > > > > > > > I had only replied to Bart, this was my reply, reply all now > > > > Hi Bart > > Thank you for always being helpful. > > > > I am using at the moment RHEL 7.5 for the initiator (based on > > kernel > > 3.10 but of course lots of backports) > > The exact same initiator is working fine with the 4.5 and I would > > not > > expect the target to require the same kernel level. > > Of course I will try latest upstream on the initiator later and > > reply > > back. > > > > I was thinking the target should adhere to the standards and > > support > > many types of kernels within reason for the initiator. > > > > Thanks > > Changed the Subject to match regression > > I tested the following: > Target > upstream 4.19_rc4 > > Inititiator > qla2xxx RHEL7.5 > lpfc RHEL7.5 > qla2xxx Upstream 4.19_rc4 matching initiatior > > All 3 are unstable and fail, with tag errors and aborts > > I am not sure when the issue started so will work on a bisect > > Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sdj, > sector 128 > Oct 15 07:29:52 ml150 kernel: print_req_error: I/O error, dev sds, > sector 16 > .. > .. > Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 FAILED Result: > hostbyte=DID_ERROR driverbyte=DRIVER_OK > Oct 15 07:30:11 ml150 kernel: sd 5:0:0:0: [sdi] tag#0 CDB: Read(10) > 28 > 00 00 00 03 00 00 01 00 00 > Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 FAILED Result: > hostbyte=DID_ERROR driverbyte=DRIVER_OK > Oct 15 07:30:16 ml150 kernel: sd 5:0:0:0: [sdi] tag#1 CDB: Read(10) > 28 > 00 00 00 00 40 00 00 38 00 > Oct 15 07:30:17 ml150 kernel: sd 5:0:1:0: [sdp] tag#0 FAILED Result: Following up here I started with 4.9 and it was stable I then tested 4.10 and its immediately broken and logs these when restoring the target configuration. This is a different failure to the upstream latest which takes the configuration but fails the I/O servicing. This is logged on the target when restoring the target configuration for 4.10 Perhaps in this case its a target config mismatch. I did not manually add targets on 4.10 but I can certainly try it. Note that 4.18 for example takes the config with no complaints so I doubt its a mismatch. [ 95.668210] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid handle/queue (1c01). [ 95.706773] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid handle/queue (0002). [ 95.745912] qla2xxx [0000:07:00.0]-5030:0: Error entry - invalid handle/queue (5838). This comes from /** * qla2x00_error_entry() - Process an error entry. * @ha: SCSI driver HA context * @pkt: Entry pointer */ static void qla2x00_error_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, sts_entry_t *pkt) { srb_t *sp; struct qla_hw_data *ha = vha->hw; const char func[] = "ERROR-IOCB"; uint16_t que = MSW(pkt->handle); struct req_que *req = NULL; int res = DID_ERROR << 16; ql_dbg(ql_dbg_async, vha, 0x502a, "type of error status in response: 0x%x\n", pkt- >entry_status); if (que >= ha->max_req_queues || !ha->req_q_map[que]) goto fatal; req = ha->req_q_map[que]; if (pkt->entry_status & RF_BUSY) res = DID_BUS_BUSY << 16; if (pkt->entry_type == NOTIFY_ACK_TYPE && pkt->handle == QLA_TGT_SKIP_HANDLE) return; sp = qla2x00_get_sp_from_handle(vha, func, req, pkt); if (sp) { sp->done(ha, sp, res); return; } fatal: ql_log(ql_log_warn, vha, 0x5030, "Error entry - invalid handle/queue (%04x).\n", que); } I started the bisect which was a challenge because of the MSIx probe failure bug (commit 17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity) but in the end I got down to this commit commit d74595278f4ab192af66d9e60a9087464638beee Author: Michael Hernandez <michael.hernandez@xxxxxxxxxx> Date: Mon Dec 12 14:40:07 2016 -0800 scsi: qla2xxx: Add multiple queue pair functionality. Its not possible to do a simple revert here This is puzzling me, as I wonder why I seem to be the only one seeing this. The qla2xxx changes between 4.9 and 4.10 are these [loberman@ml150 linux]$ git log --oneline v4.9..v4.10 drivers/scsi/qla2xxx 13ebfd0 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ad3efdb Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into fixes 2780f3c scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel crash 27873de scsi: qla2xxx: Fix a recently introduced memory leak 5116226 Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux 300af14 qla2xxx: Disable out-of-order processing by default in firmware 4f06073 qla2xxx: Fix erroneous invalid handle message 200ffb1 qla2xxx: Reduce exess wait during chip reset 5f35509 qla2xxx: Terminate exchange if corrupted fc1ffd6 qla2xxx: Fix crash due to null pointer access 8d3c9c2 qla2xxx: Collect additional information to debug fw dump c0f6462 qla2xxx: Reset reserved field in firmware options to 0 2a47c68 qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version 1cbb915 qla2xxx: Include ATIO queue in firmware dump when in target mode bb1181c qla2xxx: Fix wrong IOCB type assumption 91f42b3 qla2xxx: Avoid that building with W=1 triggers complaints about set-but-not-used variables 61778a1 qla2xxx: Move two arrays from header files to .c files ca82582 qla2xxx: Declare an array with file scope static c2a5d94 qla2xxx: Fix indentation 2f5a3145 Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into fixes 98624c4 scsi: qla2xxx: remove irq_affinity_notifier 17e5fc5 scsi: qla2xxx: fix MSI-X vector affinity c3c4239 scsi: qla2xxx: Fix apparent cut-n-paste error. c7702b8 scsi: qla2xxx: Get mutex lock before checking optrom_state 7c0f6ba Replace <asm/uaccess.h> with <linux/uaccess.h> globally f290cba Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi d5db84a8 Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux 093df73 scsi: qla2xxx: Fix Target mode handling with Multiqueue changes. 5601236 scsi: qla2xxx: Add Block Multi Queue functionality. d745952 scsi: qla2xxx: Add multiple queue pair functionality. 4fa1834 scsi: qla2xxx: Utilize pci_alloc_irq_vectors/pci_free_irq_vectors calls. 77ddb94 scsi: qla2xxx: Only allow operational MBX to proceed during RESET. a829a84 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi 09ce66a qla2xxx: Add an #include directive 0654816 scsi: fc: use bsg_job_done 75cc8cf scsi: change FC drivers to use 'struct bsg_job' 1d69b12 scsi: fc: provide fc_bsg_to_rport() helper cd21c60 scsi: fc: provide fc_bsg_to_shost() helper 1abaede scsi: fc: Export fc_bsg_jobdone and use it in FC drivers 01e0e15 scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly We may have multiple issues at play here. i.e. one between 4.9 and 4.10 and the actual I./O tag failure one on later kernels. I will try 4.15 now as for some reason I seem to remember that somewhat working here. Its been a while though. The next puzzle is Himanshu says its been working for him but I am testing actual SAN connectivity between a physically separate initiator and target. Thanks Laurence