Re: [Open-FCoE] System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The test setup includes one host and one target. The target exposes 10
hard drives (or 10 LUNs) on one fcoe port. The single initiator runs
10 fio processes simultaneously to the 10 target drives through fcoe
vn2vn. This is a simple configuration that other people may also want
to try.

>Exchange 0x6e4 is aborted and then target still sending frame, while
>later should not occur but first setting up abort with 0 msec timeout
>doesn not look correct either and it is different that 8000 ms on
>initiator side.

Should target stop sending frame after abort? I still see a lot of 0
msec messages on target side. Is this something that should be
addressed?

>Reducing retries could narrow down to early aborts is the cause here,
>can you try with REC disabled on initiator side for that using this
>change ?

By disabling REC have you confirmed that the early aborts is the
cause? Is the abort caused by 0 msec timeout?

Thanks,

Jun

On Mon, Jun 2, 2014 at 5:21 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> On Mon, 2014-06-02 at 16:55 -0700, Jun Wu wrote:
>> Hi Vasu,
>>
>> Is there anything else we can do to collect more information or adjust
>> the timeout value?
>
> Increased Timeouts will help in reducing seq_send failure prints and
> must be the case with REC disabled patch try since that skipped REC
> completely and with that defaulted to longer SCSI retry timeouts.
> Anycase seq_send failure prints are just noise here since seq_send fails
> only for aborted exchanges and that is very likely with tcm fc abort
> handling, especially in your oversubscribed target setup with 1 target
> taking workloads from 10 hosts. I mean seq_send fails only due to
> following aborted exchanges check since thereafter fcoe_xmit always
> returns success.
>
>         if (ep->esb_stat & (ESB_ST_COMPLETE | ESB_ST_ABNORMAL)) {
>                 fc_frame_free(fp);
>                 goto out;
>         }
>
> So I think it is okay to get rid off send failure prints same as tcm fc
> has it on other places of callings seq_send with this patch:-
>
> diff --git a/drivers/target/tcm_fc/tfc_io.c
> b/drivers/target/tcm_fc/tfc_io.c
> index a48b96d..0425fc0 100644
> --- a/drivers/target/tcm_fc/tfc_io.c
> +++ b/drivers/target/tcm_fc/tfc_io.c
> @@ -200,15 +200,7 @@ int ft_queue_data_in(struct se_cmd *se_cmd)
>                         f_ctl |= FC_FC_END_SEQ;
>                 fc_fill_fc_hdr(fp, FC_RCTL_DD_SOL_DATA, ep->did,
> ep->sid,
>                                FC_TYPE_FCP, f_ctl, fh_off);
> -               error = lport->tt.seq_send(lport, seq, fp);
> -               if (error) {
> -                       /* XXX For now, initiator will retry */
> -                       pr_err_ratelimited("%s: Failed to send frame %p,
> "
> -                                               "xid <0x%x>, remaining %
> zu, "
> -                                               "lso_max <0x%x>\n",
> -                                               __func__, fp, ep->xid,
> -                                               remaining,
> lport->lso_max);
> -               }
> +               lport->tt.seq_send(lport, seq, fp);
>         }
>         return ft_queue_status(se_cmd);
>  }
>
>> Thanks,
>>
>> Jun
>>
>> On Tue, May 27, 2014 at 12:44 PM, Jun Wu <jwu@xxxxxxxxxxxx> wrote:
>> > We do not have DCB PFC so the traffic is managed by PAUSE frames by default.
>> > The kernel has been recompiled to include your change. I ran the same
>> > test after the kernel change, i.e. running 10 fio sessions from
>> > initiator to 10 drives on target. The target machine hung after a few
>> > minutes.
>> >
>
> Target could be either really overloaded with your workloads in 10 hosts
> to single target setup or you it really stuck, for later case enabling
> hung task detection could help under kernel hacking as that would dump
> stack trace of hung task. If it is the earlier case then I don't know
> how target back end handle the case of sustained over subscribed target
> bandwidth, may be Nick could shed some info on that and possibly tcm_fc
> has related missing feedback into target core for that.
>
> If your workloads exceed demand intermittently then adjusting timeout as
> you mentioned above or increasing all Tx/Rx path resources could help in
> absorbing such intermittent incidents and in turn not leading to host IO
> timeouts/failures.
>
> //Vasu
>
>> > On the initiator side the messages look like:
>> >  4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e:
>> > xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR
>> >  4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid  1cd:
>> > f_ctl  90000 seq  1
>> >  4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid  1cd: BLS
>> > rctl 85 - BLS reject received
>> >  4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e:
>> > xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR
>> >  4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid  1ee:
>> > f_ctl  90000 seq  1
>> >  4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >
>> > I see much less Exchange timer messages than last time. Here are all
>> > the instances of them:
>> >
>> >  6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e:
>> > xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR
>> >  6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid  181:
>> > f_ctl  90000 seq  1
>> >  6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e:
>> > xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR
>> >  6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid    5:
>> > f_ctl  90000 seq  1
>> >  6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e:
>> > xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR
>> >  6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid   60:
>> > f_ctl  90000 seq  1
>> >  6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid  12c:
>> > exch: BLS rctl 84 - BLS accept
>> >  6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
>> >  6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid  12c:
>> > Exchange timer armed : 10000 msecs
>> >  6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e:
>> > xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR
>> >  6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid   cc:
>> > f_ctl  90000 seq  1
>> >  6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid   60: BLS
>> > rctl 85 - BLS reject received
>> >  6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid  14d: BLS
>> > rctl 85 - BLS reject received
>> >  6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid  18b: BLS
>> > rctl 85 - BLS reject received
>> >  6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid   ed: BLS
>> > rctl 85 - BLS reject received
>> >  6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid   21: BLS
>> > rctl 85 - BLS reject received
>> >  6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid  122: BLS
>> > rctl 85 - BLS reject received
>> >  6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid   45: BLS
>> > rctl 85 - BLS reject received
>> >  6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid  16f: BLS
>> > rctl 85 - BLS reject received
>> >  6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid  14c: BLS
>> > rctl 85 - BLS reject received
>> >  6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid   8a: BLS
>> > rctl 85 - BLS reject received
>> >  6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid  181: BLS
>> > rctl 85 - BLS reject received
>> >  6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid  16b: BLS
>> > rctl 85 - BLS reject received
>> >  6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid   cc: BLS
>> > rctl 85 - BLS reject received
>> >  6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid  12c:
>> > Exchange timed out
>> >  6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid  e65:
>> > Exchange timer armed : 2000 msecs
>> >  6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid  e65:
>> > Exchange timer canceled
>> >  6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid  12c:
>> > LS_RJT for RRQ
>> >  6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi:
>> > Resetting rport (00061e)
>> >  6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun
>> > reset to lun 0 completed
>> >
>> > and
>> >
>> >  7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e:
>> > xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000
>> > t_blen=1000
>> >  7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid  6ae:
>> > f_ctl  90000 seq  1
>> >  7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e:
>> > xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000
>> > t_blen=1000
>> >  7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid  68e:
>> > f_ctl  90000 seq  1
>> >  7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e:
>> > xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000
>> > t_blen=1000
>> >  7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid  c2e:
>> > f_ctl  90000 seq  1
>> >  7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid  1c4: BLS
>> > rctl 85 - BLS reject received
>> >  7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid   8c: BLS
>> > rctl 85 - BLS reject received
>> >  7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid  6ae:
>> > exch: BLS rctl 84 - BLS accept
>> >  7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
>> >  7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid  6ae:
>> > Exchange timer armed : 10000 msecs
>> >  7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid   ef: BLS
>> > rctl 85 - BLS reject received
>> >  7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e:
>> > xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR
>> >  7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid  1cf:
>> > f_ctl  90000 seq  1
>> >  7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid  1cf: BLS
>> > rctl 85 - BLS reject received
>> >  7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e:
>> > xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR
>> >  7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid   c4:
>> > f_ctl  90000 seq  1
>> >  7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid   c4: BLS
>> > rctl 85 - BLS reject received
>> >  7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid   e6: BLS
>> > rctl 85 - BLS reject received
>> >  7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid   87: BLS
>> > rctl 85 - BLS reject received
>> >  7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid  107: BLS
>> > rctl 85 - BLS reject received
>> >  7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e:
>> > xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR
>> >  7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid   42:
>> > f_ctl  90000 seq  1
>> >  7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid   42: BLS
>> > rctl 85 - BLS reject received
>> >  7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid    7: BLS
>> > rctl 85 - BLS reject received
>> >  7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid  1a9:
>> > exch: BLS rctl 84 - BLS accept
>> >  7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
>> >  7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid  10d: BLS
>> > rctl 85 - BLS reject received
>> >  7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid  18e: BLS
>> > rctl 85 - BLS reject received
>> >  7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e:
>> > xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR
>> >  7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid  1e4:
>> > f_ctl  90000 seq  1
>> >  7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid  1e4: BLS
>> > rctl 85 - BLS reject received
>> >  7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid   8d: BLS
>> > rctl 85 - BLS reject received
>> >  7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e:
>> > xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000
>> > t_blen=1000
>> >  7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid  c64:
>> > f_ctl  90000 seq  1
>> >  7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid  6ae:
>> > Exchange timed out
>> >  7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid  e65:
>> > Exchange timer armed : 2000 msecs
>> >  7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid  e65:
>> > Exchange timed out
>> >  7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid  e65:
>> > f_ctl  90000 seq  1
>> >  7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid  e65:
>> > Exchange timer armed : 20000 msecs
>> >  7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e:
>> > Remove port
>> >  7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e:
>> > Port sending LOGO from Ready state
>> >
>> > and finally
>> >
>> >  7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e:
>> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
>> >  7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid  c44:
>> > f_ctl  90000 seq  1
>> >  7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e:
>> > Received a LOGO response closed
>> >  7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid  e65:
>> > Exchange timer canceled
>> >  7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e:
>> > work delete
>> >  7729 May 27 10:46:31 poc1 kernel: [ 1863.357830]  rport-7:0-0:
>> > blocked FC remote port time out: removing target and saving binding
>> >
>> >
>> > The messages on the target side are similar as before.
>> > This is when Exhange timer message first appears:
>> > 18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid  c48:
>> > f_ctl 800000 seq  1
>> > 18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid  c48:
>> > f_ctl 880008 seq  2
>> > 18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid  766:
>> > f_ctl 800000 seq  1
>> > 18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid  766:
>> > f_ctl 880008 seq  2
>> > 18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid  cce:
>> > f_ctl 800000 seq  1
>> > 18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid  cce:
>> > f_ctl 880008 seq  2
>> > 18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid  2c7:
>> > f_ctl 800000 seq  1
>> > 18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid  2c7:
>> > f_ctl 880008 seq  2
>> > 18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid  f09:
>> > f_ctl 800000 seq  1
>> > 18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid  f09:
>> > f_ctl 880008 seq  2
>> > 18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid  ce0:
>> > f_ctl 800000 seq  1
>> > 18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid  ce0:
>> > f_ctl 880008 seq  2
>> > 18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid  8eb:
>> > f_ctl 800000 seq  1
>> > 18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid  32f:
>> > f_ctl 800000 seq  1
>> > 18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid  8eb:
>> > f_ctl 880008 seq  2
>> > 18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid  32f:
>> > f_ctl 880008 seq  2
>> > 18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid  98d:
>> > f_ctl 800000 seq  1
>> > 18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid  98d:
>> > f_ctl 880008 seq  2
>> > 18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid  c01:
>> > f_ctl 800000 seq  1
>> > 18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid  c01:
>> > f_ctl 880008 seq  2
>> > 18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid  6ea:
>> > f_ctl 800000 seq  1
>> > 18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid  6ea:
>> > f_ctl 880008 seq  2
>> > 18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid  766:
>> > f_ctl 800000 seq  1
>> > 18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid  766:
>> > f_ctl 880008 seq  2
>> > 18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid  30c:
>> > f_ctl 800000 seq  1
>> > 18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid  30c:
>> > f_ctl 880008 seq  2
>> > 18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
>> > Exchange timer armed : 0 msecs
>> > 18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid  c48:
>> > f_ctl 800000 seq  1
>> > 18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
>> > Exchange timed out
>> >
>> > This is when ft_queue_data_in failure first appears:
>> > 18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid  327:
>> > f_ctl 800000 seq  1
>> > 18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid  327:
>> > f_ctl 880008 seq  2
>> > 18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid  d68:
>> > f_ctl 800000 seq  1
>> > 18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid  d68:
>> > f_ctl 880008 seq  2
>> > 18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid  f89:
>> > f_ctl 800000 seq  1
>> > 18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid  f89:
>> > f_ctl 880008 seq  2
>> > 18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid  74a:
>> > f_ctl 800000 seq  1
>> > 18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid  74a:
>> > f_ctl 880008 seq  2
>> > 18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid  40c:
>> > f_ctl 800000 seq  1
>> > 18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid  40c:
>> > f_ctl 880008 seq  2
>> > 18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid  92b:
>> > f_ctl 800000 seq  1
>> > 18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid  92b:
>> > f_ctl 880008 seq  2
>> > 18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid  dce:
>> > f_ctl 800000 seq  1
>> > 18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid  dce:
>> > f_ctl 880008 seq  2
>> > 18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid  42f:
>> > f_ctl 800000 seq  1
>> > 18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid  42f:
>> > f_ctl 880008 seq  2
>> > 18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid  90d:
>> > f_ctl 800000 seq  1
>> > 18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid  90d:
>> > f_ctl 880008 seq  2
>> > 18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid  d60:
>> > f_ctl 800000 seq  1
>> > 18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid  d60:
>> > f_ctl 880008 seq  2
>> > 18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid  ac4:
>> > f_ctl 800000 seq  1
>> > 18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid  ac4:
>> > f_ctl 880008 seq  2
>> > 18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid  fc2:
>> > f_ctl 800000 seq  1
>> > 18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid  fc2:
>> > f_ctl 880008 seq  2
>> > 18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid  885:
>> > f_ctl 800000 seq  1
>> > 18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid  885:
>> > f_ctl 880008 seq  2
>> > 18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid  ce1:
>> > f_ctl 800000 seq  1
>> > 18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid  ce1:
>> > f_ctl 880008 seq  2
>> > 18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid  7a3:
>> > f_ctl 800000 seq  1
>> > 18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid  7a3:
>> > f_ctl 880008 seq  2
>> > 18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid  7e6:
>> > f_ctl 800000 seq  1
>> > 18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid  7e6:
>> > f_ctl 880008 seq  2
>> > 18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid  327:
>> > f_ctl 800000 seq  1
>> > 18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
>> > Exchange timer armed : 0 msecs
>> > 18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid  327:
>> > f_ctl 800008 seq  2
>> > 18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
>> > lso_max <0x10000>
>> > 18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
>> > lso_max <0x10000>
>> > 18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
>> > lso_max <0x10000>
>> >
>> > After stripping out the "host7: xid  xxx: f_ctl 8x0000 seq  x" lines,
>> > the messages on the target look like:
>> > 15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too
>> > long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to
>> > 50000
>> > 18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
>> > Exchange timer armed : 0 msecs
>> > 18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
>> > Exchange timed out
>> > 18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid  766:
>> > Exchange timer armed : 0 msecs
>> > 18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid  766:
>> > Exchange timed out
>> > 18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid  cce:
>> > Exchange timer armed : 0 msecs
>> > 18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid  cce:
>> > Exchange timed out
>> > 18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid  a84:
>> > Exchange timer armed : 0 msecs
>> > 18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid  a84:
>> > Exchange timed out
>> > 18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid  d00:
>> > Exchange timer armed : 0 msecs
>> > 18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid  d00:
>> > Exchange timed out
>> > 18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid  287:
>> > Exchange timer armed : 0 msecs
>> > 18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid  287:
>> > Exchange timed out
>> > 18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid  a64:
>> > Exchange timer armed : 0 msecs
>> > 18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid  a64:
>> > Exchange timed out
>> > 18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid  be8:
>> > Exchange timer armed : 0 msecs
>> > 18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid  be8:
>> > Exchange timed out
>> > 18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid  ea9:
>> > Exchange timer armed : 0 msecs
>> > 18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid  ea9:
>> > Exchange timed out
>> > 18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid  c01:
>> > Exchange timer armed : 0 msecs
>> > 18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid  c01:
>> > Exchange timed out
>> > 18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid  30c:
>> > Exchange timer armed : 0 msecs
>> > 18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid  30c:
>> > Exchange timed out
>> > 18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid  c8e:
>> > Exchange timer armed : 0 msecs
>> > 18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid  c8e:
>> > Exchange timed out
>> > 18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid  9ad:
>> > Exchange timer armed : 0 msecs
>> > 18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid  9ad:
>> > Exchange timed out
>> > 18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid  ce0:
>> > Exchange timer armed : 0 msecs
>> > 18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid  ce0:
>> > Exchange timed out
>> > 18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid  746:
>> > Exchange timer armed : 0 msecs
>> > 18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid  746:
>> > Exchange timed out
>> > 18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid  26c:
>> > Exchange timer armed : 0 msecs
>> > 18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid  26c:
>> > Exchange timed out
>> > 18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid  f09:
>> > Exchange timer armed : 0 msecs
>> > 18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid  f09:
>> > Exchange timed out
>> > 18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid  d2e:
>> > Exchange timer armed : 0 msecs
>> > 18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid  d2e:
>> > Exchange timed out
>> > 18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid  32f:
>> > Exchange timer armed : 0 msecs
>> > 18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid  32f:
>> > Exchange timed out
>> > 18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid  c68:
>> > Exchange timer armed : 0 msecs
>> > 18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid  c68:
>> > Exchange timed out
>> > 18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid  267:
>> > Exchange timer armed : 0 msecs
>> > 18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid  267:
>> > Exchange timed out
>> > 18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid  3cf:
>> > Exchange timer armed : 0 msecs
>> > 18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid  3cf:
>> > Exchange timed out
>> > 18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid  984:
>> > Exchange timer armed : 0 msecs
>> > 18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid  984:
>> > Exchange timed out
>> > 18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid  7a6:
>> > Exchange timer armed : 0 msecs
>> > 18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid  7a6:
>> > Exchange timed out
>> > 18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid  2c7:
>> > Exchange timer armed : 0 msecs
>> > 18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid  2c7:
>> > Exchange timed out
>> > 18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid  307:
>> > Exchange timer armed : 0 msecs
>> > 18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid  307:
>> > Exchange timed out
>> > 18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid  44f:
>> > Exchange timer armed : 0 msecs
>> > 18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid  44f:
>> > Exchange timed out
>> > 18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid  d01:
>> > Exchange timer armed : 0 msecs
>> > 18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid  d01:
>> > Exchange timed out
>> > 18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid  94b:
>> > Exchange timer armed : 0 msecs
>> > 18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid  94b:
>> > Exchange timed out
>> > 18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid  743:
>> > Exchange timer armed : 0 msecs
>> > 18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid  743:
>> > Exchange timed out
>> > 18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid  f82:
>> > Exchange timer armed : 0 msecs
>> > 18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid  f82:
>> > Exchange timed out
>> > 18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid  e00:
>> > Exchange timer armed : 0 msecs
>> > 18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid  e00:
>> > Exchange timed out
>> > 18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid  d08:
>> > Exchange timer armed : 0 msecs
>> > 18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid  d08:
>> > Exchange timed out
>> > 18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid  8c5:
>> > Exchange timer armed : 0 msecs
>> > 18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid  8c5:
>> > Exchange timed out
>> > 18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid  92d:
>> > Exchange timer armed : 0 msecs
>> > 18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid  92d:
>> > Exchange timed out
>> > 18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid  a04:
>> > Exchange timer armed : 0 msecs
>> > 18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid  a04:
>> > Exchange timed out
>> > 18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid  d28:
>> > Exchange timer armed : 0 msecs
>> > 18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid  d28:
>> > Exchange timed out
>> > 18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid  8ed:
>> > Exchange timer armed : 0 msecs
>> > 18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid  8ed:
>> > Exchange timed out
>> > 18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid  3ef:
>> > Exchange timer armed : 0 msecs
>> > 18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid  3ef:
>> > Exchange timed out
>> > 18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid  fa9:
>> > Exchange timer armed : 0 msecs
>> > 18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid  fa9:
>> > Exchange timed out
>> > 18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid  38c:
>> > Exchange timer armed : 0 msecs
>> > 18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid  38c:
>> > Exchange timed out
>> > 18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid  d48:
>> > Exchange timer armed : 0 msecs
>> > 18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid  d48:
>> > Exchange timed out
>> > 18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid  aa4:
>> > Exchange timer armed : 0 msecs
>> > 18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid  aa4:
>> > Exchange timed out
>> > 18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid  f22:
>> > Exchange timer armed : 0 msecs
>> > 18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid  f22:
>> > Exchange timed out
>> > 18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid  da0:
>> > Exchange timer armed : 0 msecs
>> > 18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid  da0:
>> > Exchange timed out
>> > 18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid  3cc:
>> > Exchange timer armed : 0 msecs
>> > 18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid  3cc:
>> > Exchange timed out
>> > 18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid  e0e:
>> > Exchange timer armed : 0 msecs
>> > 18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid  e0e:
>> > Exchange timed out
>> > 18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid  2e7:
>> > Exchange timer armed : 0 msecs
>> > 18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid  2e7:
>> > Exchange timed out
>> > 18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid  826:
>> > Exchange timer armed : 0 msecs
>> > 18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid  826:
>> > Exchange timed out
>> > 18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid  40f:
>> > Exchange timer armed : 0 msecs
>> > 18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid  40f:
>> > Exchange timed out
>> > 18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
>> > Exchange timer armed : 0 msecs
>> > 18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
>> > lso_max <0x10000>
>> > 18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
>> > lso_max <0x10000>
>> > 18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
>> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
>> > lso_max <0x10000>
>> > 18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid  327:
>> > Exchange timed out
>> > 18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid  ac4:
>> > Exchange timer armed : 2000 msecs
>> > 18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid  ac4:
>> > Exchange timed out
>> > 18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid  ac4:
>> > f_ctl  90000 seq  1
>> > 18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid  ac4:
>> > Exchange timer armed : 20000 msecs
>> >
>> > Thanks
>> >
>> >
>> > On Fri, May 23, 2014 at 9:23 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
>> >> On Wed, 2014-05-21 at 15:23 -0700, Jun Wu wrote:
>> >>> > 18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid
>> >>> 6e4:
>> >>> > Exchange timer armed : 0 msecs
>> >>> > 18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid
>> >>> 6e4:
>> >>> > f_ctl 800000 seq  1
>> >>> > 18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid
>> >>> 6e4:
>> >>> > f_ctl 800000 seq  2
>> >>> > 18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid
>> >>> 6e4:
>> >>> > Exchange timed out
>> >>> > 18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777]
>> >>> ft_queue_data_in:
>> >>> > Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining
>> >>> 458752,
>> >>> > lso_max <0x10000>
>> >>
>> >> Exchange 0x6e4 is aborted and then target still sending frame, while
>> >> later should not occur but first setting up abort with 0 msec timeout
>> >> doesn not look correct either and it is different that 8000 ms on
>> >> initiator side. Are you using same switch between initiator and target
>> >> with DCB enabled ?
>> >>
>> >> Reducing retries could narrow down to early aborts is the cause here,
>> >> can you try with REC disabled on initiator side for that using this
>> >> change ?
>> >>
>> >> diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
>> >> index 1d7e76e..43fe793 100644
>> >> --- a/drivers/scsi/libfc/fc_fcp.c
>> >> +++ b/drivers/scsi/libfc/fc_fcp.c
>> >> @@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport,
>> >> struct fc_fcp_pkt *fsp,
>> >>         fc_fcp_pkt_hold(fsp);   /* hold for fc_fcp_pkt_destroy */
>> >>
>> >>         setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp);
>> >> +       rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED;
>> >>         if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED)
>> >>                 fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp));
>> >>
>> >>
>
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux