On Wed, 2014-06-04 at 11:45 -0700, Jun Wu wrote: > The test setup includes one host and one target. The target exposes 10 > hard drives (or 10 LUNs) on one fcoe port. The single initiator runs > 10 fio processes simultaneously to the 10 target drives through fcoe > vn2vn. This is a simple configuration that other people may also want > to try. > Still seq_send could fail on aborted exchange as I explained below with stress from 10 fio sessions from same host, thanks for clarifying. > >Exchange 0x6e4 is aborted and then target still sending frame, while > >later should not occur but first setting up abort with 0 msec timeout > >doesn not look correct either and it is different that 8000 ms on > >initiator side. > > Should target stop sending frame after abort? I still see a lot of 0 > msec messages on target side. First target doesn't sends aborts and 0 msec prints are on receipt of abort. > Is this something that should be > addressed? > May be but will not help for your case since once exchange aborted then subsequent seq_send will fail on aborted exchange. > >Reducing retries could narrow down to early aborts is the cause here, > >can you try with REC disabled on initiator side for that using this > >change ? > > By disabling REC have you confirmed that the early aborts is the > cause? Can you count instance before and after through full log for *same duration run* ? I guess should be less as we talked before in context of increasing timeouts below. > Is the abort caused by 0 msec timeout? No target doesn't initiate abort and these prints are on receipt of abort handling and once exchange aborted then seq_send will fail with noisy prints. //Vasu > Thanks, > > Jun > > On Mon, Jun 2, 2014 at 5:21 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2014-06-02 at 16:55 -0700, Jun Wu wrote: > >> Hi Vasu, > >> > >> Is there anything else we can do to collect more information or adjust > >> the timeout value? > > > > Increased Timeouts will help in reducing seq_send failure prints and > > must be the case with REC disabled patch try since that skipped REC > > completely and with that defaulted to longer SCSI retry timeouts. > > Anycase seq_send failure prints are just noise here since seq_send fails > > only for aborted exchanges and that is very likely with tcm fc abort > > handling, especially in your oversubscribed target setup with 1 target > > taking workloads from 10 hosts. I mean seq_send fails only due to > > following aborted exchanges check since thereafter fcoe_xmit always > > returns success. > > > > if (ep->esb_stat & (ESB_ST_COMPLETE | ESB_ST_ABNORMAL)) { > > fc_frame_free(fp); > > goto out; > > } > > > > So I think it is okay to get rid off send failure prints same as tcm fc > > has it on other places of callings seq_send with this patch:- > > > > diff --git a/drivers/target/tcm_fc/tfc_io.c > > b/drivers/target/tcm_fc/tfc_io.c > > index a48b96d..0425fc0 100644 > > --- a/drivers/target/tcm_fc/tfc_io.c > > +++ b/drivers/target/tcm_fc/tfc_io.c > > @@ -200,15 +200,7 @@ int ft_queue_data_in(struct se_cmd *se_cmd) > > f_ctl |= FC_FC_END_SEQ; > > fc_fill_fc_hdr(fp, FC_RCTL_DD_SOL_DATA, ep->did, > > ep->sid, > > FC_TYPE_FCP, f_ctl, fh_off); > > - error = lport->tt.seq_send(lport, seq, fp); > > - if (error) { > > - /* XXX For now, initiator will retry */ > > - pr_err_ratelimited("%s: Failed to send frame %p, > > " > > - "xid <0x%x>, remaining % > > zu, " > > - "lso_max <0x%x>\n", > > - __func__, fp, ep->xid, > > - remaining, > > lport->lso_max); > > - } > > + lport->tt.seq_send(lport, seq, fp); > > } > > return ft_queue_status(se_cmd); > > } > > > >> Thanks, > >> > >> Jun > >> > >> On Tue, May 27, 2014 at 12:44 PM, Jun Wu <jwu@xxxxxxxxxxxx> wrote: > >> > We do not have DCB PFC so the traffic is managed by PAUSE frames by default. > >> > The kernel has been recompiled to include your change. I ran the same > >> > test after the kernel change, i.e. running 10 fio sessions from > >> > initiator to 10 drives on target. The target machine hung after a few > >> > minutes. > >> > > > > > Target could be either really overloaded with your workloads in 10 hosts > > to single target setup or you it really stuck, for later case enabling > > hung task detection could help under kernel hacking as that would dump > > stack trace of hung task. If it is the earlier case then I don't know > > how target back end handle the case of sustained over subscribed target > > bandwidth, may be Nick could shed some info on that and possibly tcm_fc > > has related missing feedback into target core for that. > > > > If your workloads exceed demand intermittently then adjusting timeout as > > you mentioned above or increasing all Tx/Rx path resources could help in > > absorbing such intermittent incidents and in turn not leading to host IO > > timeouts/failures. > > > > //Vasu > > > >> > On the initiator side the messages look like: > >> > 4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e: > >> > xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR > >> > 4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid 1cd: > >> > f_ctl 90000 seq 1 > >> > 4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid 1cd: BLS > >> > rctl 85 - BLS reject received > >> > 4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e: > >> > xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR > >> > 4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid 1ee: > >> > f_ctl 90000 seq 1 > >> > 4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > > >> > I see much less Exchange timer messages than last time. Here are all > >> > the instances of them: > >> > > >> > 6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e: > >> > xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR > >> > 6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid 181: > >> > f_ctl 90000 seq 1 > >> > 6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e: > >> > xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR > >> > 6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid 5: > >> > f_ctl 90000 seq 1 > >> > 6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e: > >> > xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR > >> > 6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid 60: > >> > f_ctl 90000 seq 1 > >> > 6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid 12c: > >> > exch: BLS rctl 84 - BLS accept > >> > 6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > >> > 6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid 12c: > >> > Exchange timer armed : 10000 msecs > >> > 6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e: > >> > xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR > >> > 6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid cc: > >> > f_ctl 90000 seq 1 > >> > 6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid 60: BLS > >> > rctl 85 - BLS reject received > >> > 6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid 14d: BLS > >> > rctl 85 - BLS reject received > >> > 6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid 18b: BLS > >> > rctl 85 - BLS reject received > >> > 6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid ed: BLS > >> > rctl 85 - BLS reject received > >> > 6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid 21: BLS > >> > rctl 85 - BLS reject received > >> > 6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid 122: BLS > >> > rctl 85 - BLS reject received > >> > 6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid 45: BLS > >> > rctl 85 - BLS reject received > >> > 6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid 16f: BLS > >> > rctl 85 - BLS reject received > >> > 6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid 14c: BLS > >> > rctl 85 - BLS reject received > >> > 6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid 8a: BLS > >> > rctl 85 - BLS reject received > >> > 6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid 181: BLS > >> > rctl 85 - BLS reject received > >> > 6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid 16b: BLS > >> > rctl 85 - BLS reject received > >> > 6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid cc: BLS > >> > rctl 85 - BLS reject received > >> > 6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid 12c: > >> > Exchange timed out > >> > 6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid e65: > >> > Exchange timer armed : 2000 msecs > >> > 6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid e65: > >> > Exchange timer canceled > >> > 6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid 12c: > >> > LS_RJT for RRQ > >> > 6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi: > >> > Resetting rport (00061e) > >> > 6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun > >> > reset to lun 0 completed > >> > > >> > and > >> > > >> > 7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e: > >> > xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000 > >> > t_blen=1000 > >> > 7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid 6ae: > >> > f_ctl 90000 seq 1 > >> > 7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e: > >> > xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000 > >> > t_blen=1000 > >> > 7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid 68e: > >> > f_ctl 90000 seq 1 > >> > 7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e: > >> > xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000 > >> > t_blen=1000 > >> > 7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid c2e: > >> > f_ctl 90000 seq 1 > >> > 7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid 1c4: BLS > >> > rctl 85 - BLS reject received > >> > 7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid 8c: BLS > >> > rctl 85 - BLS reject received > >> > 7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid 6ae: > >> > exch: BLS rctl 84 - BLS accept > >> > 7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > >> > 7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid 6ae: > >> > Exchange timer armed : 10000 msecs > >> > 7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid ef: BLS > >> > rctl 85 - BLS reject received > >> > 7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e: > >> > xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR > >> > 7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid 1cf: > >> > f_ctl 90000 seq 1 > >> > 7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid 1cf: BLS > >> > rctl 85 - BLS reject received > >> > 7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e: > >> > xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR > >> > 7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid c4: > >> > f_ctl 90000 seq 1 > >> > 7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid c4: BLS > >> > rctl 85 - BLS reject received > >> > 7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid e6: BLS > >> > rctl 85 - BLS reject received > >> > 7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid 87: BLS > >> > rctl 85 - BLS reject received > >> > 7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid 107: BLS > >> > rctl 85 - BLS reject received > >> > 7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e: > >> > xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR > >> > 7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid 42: > >> > f_ctl 90000 seq 1 > >> > 7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid 42: BLS > >> > rctl 85 - BLS reject received > >> > 7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid 7: BLS > >> > rctl 85 - BLS reject received > >> > 7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid 1a9: > >> > exch: BLS rctl 84 - BLS accept > >> > 7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > >> > 7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid 10d: BLS > >> > rctl 85 - BLS reject received > >> > 7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid 18e: BLS > >> > rctl 85 - BLS reject received > >> > 7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e: > >> > xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR > >> > 7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid 1e4: > >> > f_ctl 90000 seq 1 > >> > 7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid 1e4: BLS > >> > rctl 85 - BLS reject received > >> > 7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid 8d: BLS > >> > rctl 85 - BLS reject received > >> > 7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e: > >> > xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000 > >> > t_blen=1000 > >> > 7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid c64: > >> > f_ctl 90000 seq 1 > >> > 7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid 6ae: > >> > Exchange timed out > >> > 7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid e65: > >> > Exchange timer armed : 2000 msecs > >> > 7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid e65: > >> > Exchange timed out > >> > 7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid e65: > >> > f_ctl 90000 seq 1 > >> > 7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid e65: > >> > Exchange timer armed : 20000 msecs > >> > 7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e: > >> > Remove port > >> > 7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e: > >> > Port sending LOGO from Ready state > >> > > >> > and finally > >> > > >> > 7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e: > >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > >> > 7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid c44: > >> > f_ctl 90000 seq 1 > >> > 7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e: > >> > Received a LOGO response closed > >> > 7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid e65: > >> > Exchange timer canceled > >> > 7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e: > >> > work delete > >> > 7729 May 27 10:46:31 poc1 kernel: [ 1863.357830] rport-7:0-0: > >> > blocked FC remote port time out: removing target and saving binding > >> > > >> > > >> > The messages on the target side are similar as before. > >> > This is when Exhange timer message first appears: > >> > 18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid c48: > >> > f_ctl 800000 seq 1 > >> > 18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid c48: > >> > f_ctl 880008 seq 2 > >> > 18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid 766: > >> > f_ctl 800000 seq 1 > >> > 18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid 766: > >> > f_ctl 880008 seq 2 > >> > 18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid cce: > >> > f_ctl 800000 seq 1 > >> > 18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid cce: > >> > f_ctl 880008 seq 2 > >> > 18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid 2c7: > >> > f_ctl 800000 seq 1 > >> > 18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid 2c7: > >> > f_ctl 880008 seq 2 > >> > 18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid f09: > >> > f_ctl 800000 seq 1 > >> > 18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid f09: > >> > f_ctl 880008 seq 2 > >> > 18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid ce0: > >> > f_ctl 800000 seq 1 > >> > 18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid ce0: > >> > f_ctl 880008 seq 2 > >> > 18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid 8eb: > >> > f_ctl 800000 seq 1 > >> > 18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid 32f: > >> > f_ctl 800000 seq 1 > >> > 18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid 8eb: > >> > f_ctl 880008 seq 2 > >> > 18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid 32f: > >> > f_ctl 880008 seq 2 > >> > 18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid 98d: > >> > f_ctl 800000 seq 1 > >> > 18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid 98d: > >> > f_ctl 880008 seq 2 > >> > 18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid c01: > >> > f_ctl 800000 seq 1 > >> > 18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid c01: > >> > f_ctl 880008 seq 2 > >> > 18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid 6ea: > >> > f_ctl 800000 seq 1 > >> > 18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid 6ea: > >> > f_ctl 880008 seq 2 > >> > 18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid 766: > >> > f_ctl 800000 seq 1 > >> > 18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid 766: > >> > f_ctl 880008 seq 2 > >> > 18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid 30c: > >> > f_ctl 800000 seq 1 > >> > 18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid 30c: > >> > f_ctl 880008 seq 2 > >> > 18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48: > >> > Exchange timer armed : 0 msecs > >> > 18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid c48: > >> > f_ctl 800000 seq 1 > >> > 18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48: > >> > Exchange timed out > >> > > >> > This is when ft_queue_data_in failure first appears: > >> > 18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid 327: > >> > f_ctl 800000 seq 1 > >> > 18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid 327: > >> > f_ctl 880008 seq 2 > >> > 18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid d68: > >> > f_ctl 800000 seq 1 > >> > 18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid d68: > >> > f_ctl 880008 seq 2 > >> > 18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid f89: > >> > f_ctl 800000 seq 1 > >> > 18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid f89: > >> > f_ctl 880008 seq 2 > >> > 18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid 74a: > >> > f_ctl 800000 seq 1 > >> > 18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid 74a: > >> > f_ctl 880008 seq 2 > >> > 18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid 40c: > >> > f_ctl 800000 seq 1 > >> > 18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid 40c: > >> > f_ctl 880008 seq 2 > >> > 18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid 92b: > >> > f_ctl 800000 seq 1 > >> > 18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid 92b: > >> > f_ctl 880008 seq 2 > >> > 18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid dce: > >> > f_ctl 800000 seq 1 > >> > 18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid dce: > >> > f_ctl 880008 seq 2 > >> > 18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid 42f: > >> > f_ctl 800000 seq 1 > >> > 18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid 42f: > >> > f_ctl 880008 seq 2 > >> > 18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid 90d: > >> > f_ctl 800000 seq 1 > >> > 18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid 90d: > >> > f_ctl 880008 seq 2 > >> > 18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid d60: > >> > f_ctl 800000 seq 1 > >> > 18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid d60: > >> > f_ctl 880008 seq 2 > >> > 18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid ac4: > >> > f_ctl 800000 seq 1 > >> > 18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid ac4: > >> > f_ctl 880008 seq 2 > >> > 18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid fc2: > >> > f_ctl 800000 seq 1 > >> > 18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid fc2: > >> > f_ctl 880008 seq 2 > >> > 18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid 885: > >> > f_ctl 800000 seq 1 > >> > 18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid 885: > >> > f_ctl 880008 seq 2 > >> > 18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid ce1: > >> > f_ctl 800000 seq 1 > >> > 18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid ce1: > >> > f_ctl 880008 seq 2 > >> > 18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid 7a3: > >> > f_ctl 800000 seq 1 > >> > 18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid 7a3: > >> > f_ctl 880008 seq 2 > >> > 18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid 7e6: > >> > f_ctl 800000 seq 1 > >> > 18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid 7e6: > >> > f_ctl 880008 seq 2 > >> > 18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid 327: > >> > f_ctl 800000 seq 1 > >> > 18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327: > >> > Exchange timer armed : 0 msecs > >> > 18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid 327: > >> > f_ctl 800008 seq 2 > >> > 18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072, > >> > lso_max <0x10000> > >> > 18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536, > >> > lso_max <0x10000> > >> > 18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0, > >> > lso_max <0x10000> > >> > > >> > After stripping out the "host7: xid xxx: f_ctl 8x0000 seq x" lines, > >> > the messages on the target look like: > >> > 15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too > >> > long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to > >> > 50000 > >> > 18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48: > >> > Exchange timer armed : 0 msecs > >> > 18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48: > >> > Exchange timed out > >> > 18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid 766: > >> > Exchange timer armed : 0 msecs > >> > 18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid 766: > >> > Exchange timed out > >> > 18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid cce: > >> > Exchange timer armed : 0 msecs > >> > 18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid cce: > >> > Exchange timed out > >> > 18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid a84: > >> > Exchange timer armed : 0 msecs > >> > 18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid a84: > >> > Exchange timed out > >> > 18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid d00: > >> > Exchange timer armed : 0 msecs > >> > 18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid d00: > >> > Exchange timed out > >> > 18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid 287: > >> > Exchange timer armed : 0 msecs > >> > 18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid 287: > >> > Exchange timed out > >> > 18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid a64: > >> > Exchange timer armed : 0 msecs > >> > 18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid a64: > >> > Exchange timed out > >> > 18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid be8: > >> > Exchange timer armed : 0 msecs > >> > 18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid be8: > >> > Exchange timed out > >> > 18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid ea9: > >> > Exchange timer armed : 0 msecs > >> > 18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid ea9: > >> > Exchange timed out > >> > 18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid c01: > >> > Exchange timer armed : 0 msecs > >> > 18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid c01: > >> > Exchange timed out > >> > 18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid 30c: > >> > Exchange timer armed : 0 msecs > >> > 18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid 30c: > >> > Exchange timed out > >> > 18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid c8e: > >> > Exchange timer armed : 0 msecs > >> > 18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid c8e: > >> > Exchange timed out > >> > 18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid 9ad: > >> > Exchange timer armed : 0 msecs > >> > 18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid 9ad: > >> > Exchange timed out > >> > 18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid ce0: > >> > Exchange timer armed : 0 msecs > >> > 18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid ce0: > >> > Exchange timed out > >> > 18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid 746: > >> > Exchange timer armed : 0 msecs > >> > 18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid 746: > >> > Exchange timed out > >> > 18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid 26c: > >> > Exchange timer armed : 0 msecs > >> > 18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid 26c: > >> > Exchange timed out > >> > 18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid f09: > >> > Exchange timer armed : 0 msecs > >> > 18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid f09: > >> > Exchange timed out > >> > 18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid d2e: > >> > Exchange timer armed : 0 msecs > >> > 18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid d2e: > >> > Exchange timed out > >> > 18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid 32f: > >> > Exchange timer armed : 0 msecs > >> > 18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid 32f: > >> > Exchange timed out > >> > 18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid c68: > >> > Exchange timer armed : 0 msecs > >> > 18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid c68: > >> > Exchange timed out > >> > 18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid 267: > >> > Exchange timer armed : 0 msecs > >> > 18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid 267: > >> > Exchange timed out > >> > 18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid 3cf: > >> > Exchange timer armed : 0 msecs > >> > 18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid 3cf: > >> > Exchange timed out > >> > 18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid 984: > >> > Exchange timer armed : 0 msecs > >> > 18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid 984: > >> > Exchange timed out > >> > 18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid 7a6: > >> > Exchange timer armed : 0 msecs > >> > 18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid 7a6: > >> > Exchange timed out > >> > 18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid 2c7: > >> > Exchange timer armed : 0 msecs > >> > 18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid 2c7: > >> > Exchange timed out > >> > 18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid 307: > >> > Exchange timer armed : 0 msecs > >> > 18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid 307: > >> > Exchange timed out > >> > 18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid 44f: > >> > Exchange timer armed : 0 msecs > >> > 18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid 44f: > >> > Exchange timed out > >> > 18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid d01: > >> > Exchange timer armed : 0 msecs > >> > 18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid d01: > >> > Exchange timed out > >> > 18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid 94b: > >> > Exchange timer armed : 0 msecs > >> > 18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid 94b: > >> > Exchange timed out > >> > 18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid 743: > >> > Exchange timer armed : 0 msecs > >> > 18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid 743: > >> > Exchange timed out > >> > 18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid f82: > >> > Exchange timer armed : 0 msecs > >> > 18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid f82: > >> > Exchange timed out > >> > 18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid e00: > >> > Exchange timer armed : 0 msecs > >> > 18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid e00: > >> > Exchange timed out > >> > 18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid d08: > >> > Exchange timer armed : 0 msecs > >> > 18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid d08: > >> > Exchange timed out > >> > 18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid 8c5: > >> > Exchange timer armed : 0 msecs > >> > 18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid 8c5: > >> > Exchange timed out > >> > 18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid 92d: > >> > Exchange timer armed : 0 msecs > >> > 18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid 92d: > >> > Exchange timed out > >> > 18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid a04: > >> > Exchange timer armed : 0 msecs > >> > 18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid a04: > >> > Exchange timed out > >> > 18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid d28: > >> > Exchange timer armed : 0 msecs > >> > 18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid d28: > >> > Exchange timed out > >> > 18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid 8ed: > >> > Exchange timer armed : 0 msecs > >> > 18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid 8ed: > >> > Exchange timed out > >> > 18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid 3ef: > >> > Exchange timer armed : 0 msecs > >> > 18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid 3ef: > >> > Exchange timed out > >> > 18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid fa9: > >> > Exchange timer armed : 0 msecs > >> > 18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid fa9: > >> > Exchange timed out > >> > 18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid 38c: > >> > Exchange timer armed : 0 msecs > >> > 18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid 38c: > >> > Exchange timed out > >> > 18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid d48: > >> > Exchange timer armed : 0 msecs > >> > 18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid d48: > >> > Exchange timed out > >> > 18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid aa4: > >> > Exchange timer armed : 0 msecs > >> > 18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid aa4: > >> > Exchange timed out > >> > 18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid f22: > >> > Exchange timer armed : 0 msecs > >> > 18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid f22: > >> > Exchange timed out > >> > 18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid da0: > >> > Exchange timer armed : 0 msecs > >> > 18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid da0: > >> > Exchange timed out > >> > 18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid 3cc: > >> > Exchange timer armed : 0 msecs > >> > 18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid 3cc: > >> > Exchange timed out > >> > 18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid e0e: > >> > Exchange timer armed : 0 msecs > >> > 18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid e0e: > >> > Exchange timed out > >> > 18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid 2e7: > >> > Exchange timer armed : 0 msecs > >> > 18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid 2e7: > >> > Exchange timed out > >> > 18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid 826: > >> > Exchange timer armed : 0 msecs > >> > 18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid 826: > >> > Exchange timed out > >> > 18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid 40f: > >> > Exchange timer armed : 0 msecs > >> > 18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid 40f: > >> > Exchange timed out > >> > 18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327: > >> > Exchange timer armed : 0 msecs > >> > 18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072, > >> > lso_max <0x10000> > >> > 18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536, > >> > lso_max <0x10000> > >> > 18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in: > >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0, > >> > lso_max <0x10000> > >> > 18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid 327: > >> > Exchange timed out > >> > 18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid ac4: > >> > Exchange timer armed : 2000 msecs > >> > 18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid ac4: > >> > Exchange timed out > >> > 18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid ac4: > >> > f_ctl 90000 seq 1 > >> > 18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid ac4: > >> > Exchange timer armed : 20000 msecs > >> > > >> > Thanks > >> > > >> > > >> > On Fri, May 23, 2014 at 9:23 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote: > >> >> On Wed, 2014-05-21 at 15:23 -0700, Jun Wu wrote: > >> >>> > 18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid > >> >>> 6e4: > >> >>> > Exchange timer armed : 0 msecs > >> >>> > 18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid > >> >>> 6e4: > >> >>> > f_ctl 800000 seq 1 > >> >>> > 18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid > >> >>> 6e4: > >> >>> > f_ctl 800000 seq 2 > >> >>> > 18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid > >> >>> 6e4: > >> >>> > Exchange timed out > >> >>> > 18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777] > >> >>> ft_queue_data_in: > >> >>> > Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining > >> >>> 458752, > >> >>> > lso_max <0x10000> > >> >> > >> >> Exchange 0x6e4 is aborted and then target still sending frame, while > >> >> later should not occur but first setting up abort with 0 msec timeout > >> >> doesn not look correct either and it is different that 8000 ms on > >> >> initiator side. Are you using same switch between initiator and target > >> >> with DCB enabled ? > >> >> > >> >> Reducing retries could narrow down to early aborts is the cause here, > >> >> can you try with REC disabled on initiator side for that using this > >> >> change ? > >> >> > >> >> diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c > >> >> index 1d7e76e..43fe793 100644 > >> >> --- a/drivers/scsi/libfc/fc_fcp.c > >> >> +++ b/drivers/scsi/libfc/fc_fcp.c > >> >> @@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport, > >> >> struct fc_fcp_pkt *fsp, > >> >> fc_fcp_pkt_hold(fsp); /* hold for fc_fcp_pkt_destroy */ > >> >> > >> >> setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp); > >> >> + rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED; > >> >> if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED) > >> >> fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp)); > >> >> > >> >> > > > > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html