On Mon, 2014-06-02 at 16:55 -0700, Jun Wu wrote: > Hi Vasu, > > Is there anything else we can do to collect more information or adjust > the timeout value? Increased Timeouts will help in reducing seq_send failure prints and must be the case with REC disabled patch try since that skipped REC completely and with that defaulted to longer SCSI retry timeouts. Anycase seq_send failure prints are just noise here since seq_send fails only for aborted exchanges and that is very likely with tcm fc abort handling, especially in your oversubscribed target setup with 1 target taking workloads from 10 hosts. I mean seq_send fails only due to following aborted exchanges check since thereafter fcoe_xmit always returns success. if (ep->esb_stat & (ESB_ST_COMPLETE | ESB_ST_ABNORMAL)) { fc_frame_free(fp); goto out; } So I think it is okay to get rid off send failure prints same as tcm fc has it on other places of callings seq_send with this patch:- diff --git a/drivers/target/tcm_fc/tfc_io.c b/drivers/target/tcm_fc/tfc_io.c index a48b96d..0425fc0 100644 --- a/drivers/target/tcm_fc/tfc_io.c +++ b/drivers/target/tcm_fc/tfc_io.c @@ -200,15 +200,7 @@ int ft_queue_data_in(struct se_cmd *se_cmd) f_ctl |= FC_FC_END_SEQ; fc_fill_fc_hdr(fp, FC_RCTL_DD_SOL_DATA, ep->did, ep->sid, FC_TYPE_FCP, f_ctl, fh_off); - error = lport->tt.seq_send(lport, seq, fp); - if (error) { - /* XXX For now, initiator will retry */ - pr_err_ratelimited("%s: Failed to send frame %p, " - "xid <0x%x>, remaining % zu, " - "lso_max <0x%x>\n", - __func__, fp, ep->xid, - remaining, lport->lso_max); - } + lport->tt.seq_send(lport, seq, fp); } return ft_queue_status(se_cmd); } > Thanks, > > Jun > > On Tue, May 27, 2014 at 12:44 PM, Jun Wu <jwu@xxxxxxxxxxxx> wrote: > > We do not have DCB PFC so the traffic is managed by PAUSE frames by default. > > The kernel has been recompiled to include your change. I ran the same > > test after the kernel change, i.e. running 10 fio sessions from > > initiator to 10 drives on target. The target machine hung after a few > > minutes. > > Target could be either really overloaded with your workloads in 10 hosts to single target setup or you it really stuck, for later case enabling hung task detection could help under kernel hacking as that would dump stack trace of hung task. If it is the earlier case then I don't know how target back end handle the case of sustained over subscribed target bandwidth, may be Nick could shed some info on that and possibly tcm_fc has related missing feedback into target core for that. If your workloads exceed demand intermittently then adjusting timeout as you mentioned above or increasing all Tx/Rx path resources could help in absorbing such intermittent incidents and in turn not leading to host IO timeouts/failures. //Vasu > > On the initiator side the messages look like: > > 4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e: > > xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR > > 4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid 1cd: > > f_ctl 90000 seq 1 > > 4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid 1cd: BLS > > rctl 85 - BLS reject received > > 4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e: > > xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR > > 4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid 1ee: > > f_ctl 90000 seq 1 > > 4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > > > I see much less Exchange timer messages than last time. Here are all > > the instances of them: > > > > 6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e: > > xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR > > 6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid 181: > > f_ctl 90000 seq 1 > > 6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e: > > xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR > > 6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid 5: > > f_ctl 90000 seq 1 > > 6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e: > > xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR > > 6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid 60: > > f_ctl 90000 seq 1 > > 6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid 12c: > > exch: BLS rctl 84 - BLS accept > > 6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > > 6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid 12c: > > Exchange timer armed : 10000 msecs > > 6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e: > > xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR > > 6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid cc: > > f_ctl 90000 seq 1 > > 6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid 60: BLS > > rctl 85 - BLS reject received > > 6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid 14d: BLS > > rctl 85 - BLS reject received > > 6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid 18b: BLS > > rctl 85 - BLS reject received > > 6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid ed: BLS > > rctl 85 - BLS reject received > > 6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid 21: BLS > > rctl 85 - BLS reject received > > 6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid 122: BLS > > rctl 85 - BLS reject received > > 6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid 45: BLS > > rctl 85 - BLS reject received > > 6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid 16f: BLS > > rctl 85 - BLS reject received > > 6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid 14c: BLS > > rctl 85 - BLS reject received > > 6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid 8a: BLS > > rctl 85 - BLS reject received > > 6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid 181: BLS > > rctl 85 - BLS reject received > > 6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid 16b: BLS > > rctl 85 - BLS reject received > > 6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid cc: BLS > > rctl 85 - BLS reject received > > 6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid 12c: > > Exchange timed out > > 6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid e65: > > Exchange timer armed : 2000 msecs > > 6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid e65: > > Exchange timer canceled > > 6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid 12c: > > LS_RJT for RRQ > > 6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi: > > Resetting rport (00061e) > > 6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun > > reset to lun 0 completed > > > > and > > > > 7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e: > > xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000 > > t_blen=1000 > > 7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid 6ae: > > f_ctl 90000 seq 1 > > 7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e: > > xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000 > > t_blen=1000 > > 7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid 68e: > > f_ctl 90000 seq 1 > > 7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e: > > xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000 > > t_blen=1000 > > 7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid c2e: > > f_ctl 90000 seq 1 > > 7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid 1c4: BLS > > rctl 85 - BLS reject received > > 7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid 8c: BLS > > rctl 85 - BLS reject received > > 7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid 6ae: > > exch: BLS rctl 84 - BLS accept > > 7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > > 7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid 6ae: > > Exchange timer armed : 10000 msecs > > 7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid ef: BLS > > rctl 85 - BLS reject received > > 7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e: > > xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR > > 7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid 1cf: > > f_ctl 90000 seq 1 > > 7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid 1cf: BLS > > rctl 85 - BLS reject received > > 7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e: > > xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR > > 7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid c4: > > f_ctl 90000 seq 1 > > 7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid c4: BLS > > rctl 85 - BLS reject received > > 7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid e6: BLS > > rctl 85 - BLS reject received > > 7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid 87: BLS > > rctl 85 - BLS reject received > > 7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid 107: BLS > > rctl 85 - BLS reject received > > 7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e: > > xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR > > 7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid 42: > > f_ctl 90000 seq 1 > > 7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid 42: BLS > > rctl 85 - BLS reject received > > 7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid 7: BLS > > rctl 85 - BLS reject received > > 7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid 1a9: > > exch: BLS rctl 84 - BLS accept > > 7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED > > 7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid 10d: BLS > > rctl 85 - BLS reject received > > 7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid 18e: BLS > > rctl 85 - BLS reject received > > 7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e: > > xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR > > 7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid 1e4: > > f_ctl 90000 seq 1 > > 7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid 1e4: BLS > > rctl 85 - BLS reject received > > 7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid 8d: BLS > > rctl 85 - BLS reject received > > 7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e: > > xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000 > > t_blen=1000 > > 7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid c64: > > f_ctl 90000 seq 1 > > 7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid 6ae: > > Exchange timed out > > 7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid e65: > > Exchange timer armed : 2000 msecs > > 7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid e65: > > Exchange timed out > > 7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid e65: > > f_ctl 90000 seq 1 > > 7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid e65: > > Exchange timer armed : 20000 msecs > > 7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e: > > Remove port > > 7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e: > > Port sending LOGO from Ready state > > > > and finally > > > > 7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e: > > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi) > > 7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid c44: > > f_ctl 90000 seq 1 > > 7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e: > > Received a LOGO response closed > > 7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid e65: > > Exchange timer canceled > > 7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e: > > work delete > > 7729 May 27 10:46:31 poc1 kernel: [ 1863.357830] rport-7:0-0: > > blocked FC remote port time out: removing target and saving binding > > > > > > The messages on the target side are similar as before. > > This is when Exhange timer message first appears: > > 18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid c48: > > f_ctl 800000 seq 1 > > 18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid c48: > > f_ctl 880008 seq 2 > > 18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid 766: > > f_ctl 800000 seq 1 > > 18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid 766: > > f_ctl 880008 seq 2 > > 18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid cce: > > f_ctl 800000 seq 1 > > 18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid cce: > > f_ctl 880008 seq 2 > > 18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid 2c7: > > f_ctl 800000 seq 1 > > 18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid 2c7: > > f_ctl 880008 seq 2 > > 18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid f09: > > f_ctl 800000 seq 1 > > 18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid f09: > > f_ctl 880008 seq 2 > > 18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid ce0: > > f_ctl 800000 seq 1 > > 18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid ce0: > > f_ctl 880008 seq 2 > > 18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid 8eb: > > f_ctl 800000 seq 1 > > 18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid 32f: > > f_ctl 800000 seq 1 > > 18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid 8eb: > > f_ctl 880008 seq 2 > > 18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid 32f: > > f_ctl 880008 seq 2 > > 18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid 98d: > > f_ctl 800000 seq 1 > > 18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid 98d: > > f_ctl 880008 seq 2 > > 18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid c01: > > f_ctl 800000 seq 1 > > 18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid c01: > > f_ctl 880008 seq 2 > > 18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid 6ea: > > f_ctl 800000 seq 1 > > 18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid 6ea: > > f_ctl 880008 seq 2 > > 18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid 766: > > f_ctl 800000 seq 1 > > 18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid 766: > > f_ctl 880008 seq 2 > > 18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid 30c: > > f_ctl 800000 seq 1 > > 18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid 30c: > > f_ctl 880008 seq 2 > > 18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48: > > Exchange timer armed : 0 msecs > > 18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid c48: > > f_ctl 800000 seq 1 > > 18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48: > > Exchange timed out > > > > This is when ft_queue_data_in failure first appears: > > 18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid 327: > > f_ctl 800000 seq 1 > > 18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid 327: > > f_ctl 880008 seq 2 > > 18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid d68: > > f_ctl 800000 seq 1 > > 18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid d68: > > f_ctl 880008 seq 2 > > 18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid f89: > > f_ctl 800000 seq 1 > > 18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid f89: > > f_ctl 880008 seq 2 > > 18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid 74a: > > f_ctl 800000 seq 1 > > 18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid 74a: > > f_ctl 880008 seq 2 > > 18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid 40c: > > f_ctl 800000 seq 1 > > 18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid 40c: > > f_ctl 880008 seq 2 > > 18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid 92b: > > f_ctl 800000 seq 1 > > 18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid 92b: > > f_ctl 880008 seq 2 > > 18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid dce: > > f_ctl 800000 seq 1 > > 18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid dce: > > f_ctl 880008 seq 2 > > 18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid 42f: > > f_ctl 800000 seq 1 > > 18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid 42f: > > f_ctl 880008 seq 2 > > 18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid 90d: > > f_ctl 800000 seq 1 > > 18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid 90d: > > f_ctl 880008 seq 2 > > 18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid d60: > > f_ctl 800000 seq 1 > > 18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid d60: > > f_ctl 880008 seq 2 > > 18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid ac4: > > f_ctl 800000 seq 1 > > 18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid ac4: > > f_ctl 880008 seq 2 > > 18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid fc2: > > f_ctl 800000 seq 1 > > 18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid fc2: > > f_ctl 880008 seq 2 > > 18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid 885: > > f_ctl 800000 seq 1 > > 18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid 885: > > f_ctl 880008 seq 2 > > 18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid ce1: > > f_ctl 800000 seq 1 > > 18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid ce1: > > f_ctl 880008 seq 2 > > 18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid 7a3: > > f_ctl 800000 seq 1 > > 18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid 7a3: > > f_ctl 880008 seq 2 > > 18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid 7e6: > > f_ctl 800000 seq 1 > > 18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid 7e6: > > f_ctl 880008 seq 2 > > 18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid 327: > > f_ctl 800000 seq 1 > > 18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327: > > Exchange timer armed : 0 msecs > > 18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid 327: > > f_ctl 800008 seq 2 > > 18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072, > > lso_max <0x10000> > > 18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536, > > lso_max <0x10000> > > 18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0, > > lso_max <0x10000> > > > > After stripping out the "host7: xid xxx: f_ctl 8x0000 seq x" lines, > > the messages on the target look like: > > 15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too > > long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to > > 50000 > > 18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48: > > Exchange timer armed : 0 msecs > > 18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48: > > Exchange timed out > > 18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid 766: > > Exchange timer armed : 0 msecs > > 18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid 766: > > Exchange timed out > > 18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid cce: > > Exchange timer armed : 0 msecs > > 18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid cce: > > Exchange timed out > > 18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid a84: > > Exchange timer armed : 0 msecs > > 18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid a84: > > Exchange timed out > > 18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid d00: > > Exchange timer armed : 0 msecs > > 18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid d00: > > Exchange timed out > > 18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid 287: > > Exchange timer armed : 0 msecs > > 18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid 287: > > Exchange timed out > > 18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid a64: > > Exchange timer armed : 0 msecs > > 18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid a64: > > Exchange timed out > > 18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid be8: > > Exchange timer armed : 0 msecs > > 18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid be8: > > Exchange timed out > > 18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid ea9: > > Exchange timer armed : 0 msecs > > 18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid ea9: > > Exchange timed out > > 18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid c01: > > Exchange timer armed : 0 msecs > > 18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid c01: > > Exchange timed out > > 18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid 30c: > > Exchange timer armed : 0 msecs > > 18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid 30c: > > Exchange timed out > > 18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid c8e: > > Exchange timer armed : 0 msecs > > 18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid c8e: > > Exchange timed out > > 18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid 9ad: > > Exchange timer armed : 0 msecs > > 18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid 9ad: > > Exchange timed out > > 18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid ce0: > > Exchange timer armed : 0 msecs > > 18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid ce0: > > Exchange timed out > > 18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid 746: > > Exchange timer armed : 0 msecs > > 18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid 746: > > Exchange timed out > > 18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid 26c: > > Exchange timer armed : 0 msecs > > 18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid 26c: > > Exchange timed out > > 18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid f09: > > Exchange timer armed : 0 msecs > > 18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid f09: > > Exchange timed out > > 18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid d2e: > > Exchange timer armed : 0 msecs > > 18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid d2e: > > Exchange timed out > > 18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid 32f: > > Exchange timer armed : 0 msecs > > 18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid 32f: > > Exchange timed out > > 18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid c68: > > Exchange timer armed : 0 msecs > > 18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid c68: > > Exchange timed out > > 18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid 267: > > Exchange timer armed : 0 msecs > > 18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid 267: > > Exchange timed out > > 18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid 3cf: > > Exchange timer armed : 0 msecs > > 18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid 3cf: > > Exchange timed out > > 18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid 984: > > Exchange timer armed : 0 msecs > > 18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid 984: > > Exchange timed out > > 18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid 7a6: > > Exchange timer armed : 0 msecs > > 18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid 7a6: > > Exchange timed out > > 18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid 2c7: > > Exchange timer armed : 0 msecs > > 18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid 2c7: > > Exchange timed out > > 18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid 307: > > Exchange timer armed : 0 msecs > > 18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid 307: > > Exchange timed out > > 18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid 44f: > > Exchange timer armed : 0 msecs > > 18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid 44f: > > Exchange timed out > > 18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid d01: > > Exchange timer armed : 0 msecs > > 18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid d01: > > Exchange timed out > > 18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid 94b: > > Exchange timer armed : 0 msecs > > 18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid 94b: > > Exchange timed out > > 18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid 743: > > Exchange timer armed : 0 msecs > > 18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid 743: > > Exchange timed out > > 18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid f82: > > Exchange timer armed : 0 msecs > > 18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid f82: > > Exchange timed out > > 18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid e00: > > Exchange timer armed : 0 msecs > > 18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid e00: > > Exchange timed out > > 18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid d08: > > Exchange timer armed : 0 msecs > > 18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid d08: > > Exchange timed out > > 18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid 8c5: > > Exchange timer armed : 0 msecs > > 18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid 8c5: > > Exchange timed out > > 18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid 92d: > > Exchange timer armed : 0 msecs > > 18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid 92d: > > Exchange timed out > > 18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid a04: > > Exchange timer armed : 0 msecs > > 18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid a04: > > Exchange timed out > > 18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid d28: > > Exchange timer armed : 0 msecs > > 18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid d28: > > Exchange timed out > > 18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid 8ed: > > Exchange timer armed : 0 msecs > > 18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid 8ed: > > Exchange timed out > > 18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid 3ef: > > Exchange timer armed : 0 msecs > > 18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid 3ef: > > Exchange timed out > > 18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid fa9: > > Exchange timer armed : 0 msecs > > 18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid fa9: > > Exchange timed out > > 18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid 38c: > > Exchange timer armed : 0 msecs > > 18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid 38c: > > Exchange timed out > > 18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid d48: > > Exchange timer armed : 0 msecs > > 18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid d48: > > Exchange timed out > > 18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid aa4: > > Exchange timer armed : 0 msecs > > 18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid aa4: > > Exchange timed out > > 18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid f22: > > Exchange timer armed : 0 msecs > > 18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid f22: > > Exchange timed out > > 18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid da0: > > Exchange timer armed : 0 msecs > > 18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid da0: > > Exchange timed out > > 18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid 3cc: > > Exchange timer armed : 0 msecs > > 18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid 3cc: > > Exchange timed out > > 18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid e0e: > > Exchange timer armed : 0 msecs > > 18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid e0e: > > Exchange timed out > > 18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid 2e7: > > Exchange timer armed : 0 msecs > > 18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid 2e7: > > Exchange timed out > > 18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid 826: > > Exchange timer armed : 0 msecs > > 18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid 826: > > Exchange timed out > > 18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid 40f: > > Exchange timer armed : 0 msecs > > 18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid 40f: > > Exchange timed out > > 18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327: > > Exchange timer armed : 0 msecs > > 18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072, > > lso_max <0x10000> > > 18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536, > > lso_max <0x10000> > > 18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in: > > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0, > > lso_max <0x10000> > > 18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid 327: > > Exchange timed out > > 18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid ac4: > > Exchange timer armed : 2000 msecs > > 18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid ac4: > > Exchange timed out > > 18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid ac4: > > f_ctl 90000 seq 1 > > 18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid ac4: > > Exchange timer armed : 20000 msecs > > > > Thanks > > > > > > On Fri, May 23, 2014 at 9:23 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote: > >> On Wed, 2014-05-21 at 15:23 -0700, Jun Wu wrote: > >>> > 18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid > >>> 6e4: > >>> > Exchange timer armed : 0 msecs > >>> > 18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid > >>> 6e4: > >>> > f_ctl 800000 seq 1 > >>> > 18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid > >>> 6e4: > >>> > f_ctl 800000 seq 2 > >>> > 18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid > >>> 6e4: > >>> > Exchange timed out > >>> > 18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777] > >>> ft_queue_data_in: > >>> > Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining > >>> 458752, > >>> > lso_max <0x10000> > >> > >> Exchange 0x6e4 is aborted and then target still sending frame, while > >> later should not occur but first setting up abort with 0 msec timeout > >> doesn not look correct either and it is different that 8000 ms on > >> initiator side. Are you using same switch between initiator and target > >> with DCB enabled ? > >> > >> Reducing retries could narrow down to early aborts is the cause here, > >> can you try with REC disabled on initiator side for that using this > >> change ? > >> > >> diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c > >> index 1d7e76e..43fe793 100644 > >> --- a/drivers/scsi/libfc/fc_fcp.c > >> +++ b/drivers/scsi/libfc/fc_fcp.c > >> @@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport, > >> struct fc_fcp_pkt *fsp, > >> fc_fcp_pkt_hold(fsp); /* hold for fc_fcp_pkt_destroy */ > >> > >> setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp); > >> + rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED; > >> if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED) > >> fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp)); > >> > >> -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html