On Fri, 2020-05-15 at 10:04 -0400, Laurence Oberman wrote: > On Fri, 2020-05-15 at 15:52 +0200, Hannes Reinecke wrote: > > On 5/15/20 3:06 PM, Laurence Oberman wrote: > > > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote: > > > > The fnic drivers assigns an ioreq structure to each command, > > > > and > > > > severs this assignment once scsi_done() has been called and the > > > > command has been completed. > > > > So when traversing commands to terminate outstanding I/O we > > > > should > > > > not call scsi_done() on commands which do not have a > > > > corresponding > > > > ioreq structure; these commands have either never entered the > > > > driver > > > > or have already been completed. > > > > > > > > Signed-off-by: Hannes Reinecke <hare@xxxxxxxx> > > > > --- > > > > drivers/scsi/fnic/fnic_scsi.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/scsi/fnic/fnic_scsi.c > > > > b/drivers/scsi/fnic/fnic_scsi.c > > > > index 27535c90b248..8d2798cbd30f 100644 > > > > --- a/drivers/scsi/fnic/fnic_scsi.c > > > > +++ b/drivers/scsi/fnic/fnic_scsi.c > > > > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic > > > > *fnic, > > > > int exclude_id) > > > > } > > > > if (!io_req) { > > > > spin_unlock_irqrestore(io_lock, flags); > > > > - goto cleanup_scsi_cmd; > > > > + continue; > > > > } > > > > > > > > CMD_SP(sc) = NULL; > > > > > > Hi Hannes, > > > Thanks for this patch, but can you share what the impact was of > > > this > > > issue. > > > What diod you see in logs/behavior > > > > > > > Unmap the LUNs from the array, and reboot the machine. > > Causing a nice kernel oops in fnic_terminate_rport_io: > > > > [ 41.904013] rport-3:0-2: blocked FC remote port time out: > > removing rport > > [ 41.911625] BUG: kernel NULL pointer dereference, address: > > 0000000000000040 > > [ 41.919408] #PF: supervisor read access in kernel mode > > [ 41.919409] #PF: error_code(0x0000) - not-present page > > [ 41.919411] PGD 0 P4D 0 > > [ 41.919416] Oops: 0000 [#1] SMP PTI > > [ 41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded > > Tainted: > > G X 5.3.18-16-default #1 SLE15-SP2 (unreleased) > > [ 41.919421] Hardware name: Cisco Systems Inc > > UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007 > > 11/06/2019 > > [ 41.919433] Workqueue: fc_wq_3 fc_rport_final_delete > > [scsi_transport_fc] > > [ 41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic] > > [ 41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c > > 48 > > 05 > > 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 > > 01 > > 00 > > 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48 > > 89 > > [ 41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082 > > [ 41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: > > ffff8c3363358bc0 > > [ 41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: > > ffff8c33632dd8c0 > > [ 41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: > > 8080808080808080 > > [ 41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: > > 0000000000000246 > > [ 41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: > > 0000000000000080 > > [ 41.919459] FS: 0000000000000000(0000) > > GS:ffff8c33bfa40000(0000) > > knlGS:0000000000000000 > > [ 41.919461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: > > 00000000000606e0 > > [ 42.066910] Call Trace: > > [ 42.066929] fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc] > > [ 42.066935] fc_rport_final_delete+0x53/0x1e0 > > [scsi_transport_fc] > > [ 42.066943] process_one_work+0x1f4/0x3e0 > > [ 42.066947] worker_thread+0x2d/0x3e0 > > [ 42.066951] ? process_one_work+0x3e0/0x3e0 > > [ 42.066954] kthread+0x10d/0x130 > > [ 42.066957] ? kthread_park+0xa0/0xa0 > > [ 42.066961] ret_from_fork+0x35/0x40 > > > > Cheers, > > > > Hannes > > Awesome, Ok Thank you! > I looked at the patch and it looks correct to me at least. > I will add a Review but prob best to have the Cisco fnic folks also > review. > Thank you for catching this. > > Reviewed-by: Laurence Oberman <loberman@xxxxxxxxxx> It seems that the Cisco people have missed this patch. Added them to the recpient list now. Regards Martin