Re: [Open-FCoE] System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2014-06-04 at 11:45 -0700, Jun Wu wrote:
> The test setup includes one host and one target. The target exposes 10
> hard drives (or 10 LUNs) on one fcoe port. The single initiator runs
> 10 fio processes simultaneously to the 10 target drives through fcoe
> vn2vn. This is a simple configuration that other people may also want
> to try.
> 

Still seq_send could fail on aborted exchange as I explained below with
stress from 10 fio sessions from same host, thanks for clarifying.

> >Exchange 0x6e4 is aborted and then target still sending frame, while
> >later should not occur but first setting up abort with 0 msec timeout
> >doesn not look correct either and it is different that 8000 ms on
> >initiator side.
> 
> Should target stop sending frame after abort? I still see a lot of 0
> msec messages on target side. 

First target doesn't sends aborts and 0 msec prints are on receipt of
abort.

> Is this something that should be
> addressed?
> 

May be but will not help for your case since once exchange aborted then
subsequent seq_send will fail on aborted exchange.

> >Reducing retries could narrow down to early aborts is the cause here,
> >can you try with REC disabled on initiator side for that using this
> >change ?
> 
> By disabling REC have you confirmed that the early aborts is the
> cause? 

Can you count instance before and after through full log for *same
duration run* ? I guess should be less as we talked before in context of
increasing timeouts below.

> Is the abort caused by 0 msec timeout?

No target doesn't initiate abort and these prints are on receipt of
abort handling and once exchange aborted then seq_send will fail with
noisy prints.

//Vasu

> Thanks,
> 
> Jun
> 
> On Mon, Jun 2, 2014 at 5:21 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> > On Mon, 2014-06-02 at 16:55 -0700, Jun Wu wrote:
> >> Hi Vasu,
> >>
> >> Is there anything else we can do to collect more information or adjust
> >> the timeout value?
> >
> > Increased Timeouts will help in reducing seq_send failure prints and
> > must be the case with REC disabled patch try since that skipped REC
> > completely and with that defaulted to longer SCSI retry timeouts.
> > Anycase seq_send failure prints are just noise here since seq_send fails
> > only for aborted exchanges and that is very likely with tcm fc abort
> > handling, especially in your oversubscribed target setup with 1 target
> > taking workloads from 10 hosts. I mean seq_send fails only due to
> > following aborted exchanges check since thereafter fcoe_xmit always
> > returns success.
> >
> >         if (ep->esb_stat & (ESB_ST_COMPLETE | ESB_ST_ABNORMAL)) {
> >                 fc_frame_free(fp);
> >                 goto out;
> >         }
> >
> > So I think it is okay to get rid off send failure prints same as tcm fc
> > has it on other places of callings seq_send with this patch:-
> >
> > diff --git a/drivers/target/tcm_fc/tfc_io.c
> > b/drivers/target/tcm_fc/tfc_io.c
> > index a48b96d..0425fc0 100644
> > --- a/drivers/target/tcm_fc/tfc_io.c
> > +++ b/drivers/target/tcm_fc/tfc_io.c
> > @@ -200,15 +200,7 @@ int ft_queue_data_in(struct se_cmd *se_cmd)
> >                         f_ctl |= FC_FC_END_SEQ;
> >                 fc_fill_fc_hdr(fp, FC_RCTL_DD_SOL_DATA, ep->did,
> > ep->sid,
> >                                FC_TYPE_FCP, f_ctl, fh_off);
> > -               error = lport->tt.seq_send(lport, seq, fp);
> > -               if (error) {
> > -                       /* XXX For now, initiator will retry */
> > -                       pr_err_ratelimited("%s: Failed to send frame %p,
> > "
> > -                                               "xid <0x%x>, remaining %
> > zu, "
> > -                                               "lso_max <0x%x>\n",
> > -                                               __func__, fp, ep->xid,
> > -                                               remaining,
> > lport->lso_max);
> > -               }
> > +               lport->tt.seq_send(lport, seq, fp);
> >         }
> >         return ft_queue_status(se_cmd);
> >  }
> >
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, May 27, 2014 at 12:44 PM, Jun Wu <jwu@xxxxxxxxxxxx> wrote:
> >> > We do not have DCB PFC so the traffic is managed by PAUSE frames by default.
> >> > The kernel has been recompiled to include your change. I ran the same
> >> > test after the kernel change, i.e. running 10 fio sessions from
> >> > initiator to 10 drives on target. The target machine hung after a few
> >> > minutes.
> >> >
> >
> > Target could be either really overloaded with your workloads in 10 hosts
> > to single target setup or you it really stuck, for later case enabling
> > hung task detection could help under kernel hacking as that would dump
> > stack trace of hung task. If it is the earlier case then I don't know
> > how target back end handle the case of sustained over subscribed target
> > bandwidth, may be Nick could shed some info on that and possibly tcm_fc
> > has related missing feedback into target core for that.
> >
> > If your workloads exceed demand intermittently then adjusting timeout as
> > you mentioned above or increasing all Tx/Rx path resources could help in
> > absorbing such intermittent incidents and in turn not leading to host IO
> > timeouts/failures.
> >
> > //Vasu
> >
> >> > On the initiator side the messages look like:
> >> >  4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e:
> >> > xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid  1cd:
> >> > f_ctl  90000 seq  1
> >> >  4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid  1cd: BLS
> >> > rctl 85 - BLS reject received
> >> >  4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e:
> >> > xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid  1ee:
> >> > f_ctl  90000 seq  1
> >> >  4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >
> >> > I see much less Exchange timer messages than last time. Here are all
> >> > the instances of them:
> >> >
> >> >  6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e:
> >> > xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid  181:
> >> > f_ctl  90000 seq  1
> >> >  6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e:
> >> > xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid    5:
> >> > f_ctl  90000 seq  1
> >> >  6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e:
> >> > xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid   60:
> >> > f_ctl  90000 seq  1
> >> >  6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid  12c:
> >> > exch: BLS rctl 84 - BLS accept
> >> >  6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
> >> >  6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid  12c:
> >> > Exchange timer armed : 10000 msecs
> >> >  6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e:
> >> > xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid   cc:
> >> > f_ctl  90000 seq  1
> >> >  6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid   60: BLS
> >> > rctl 85 - BLS reject received
> >> >  6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid  14d: BLS
> >> > rctl 85 - BLS reject received
> >> >  6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid  18b: BLS
> >> > rctl 85 - BLS reject received
> >> >  6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid   ed: BLS
> >> > rctl 85 - BLS reject received
> >> >  6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid   21: BLS
> >> > rctl 85 - BLS reject received
> >> >  6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid  122: BLS
> >> > rctl 85 - BLS reject received
> >> >  6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid   45: BLS
> >> > rctl 85 - BLS reject received
> >> >  6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid  16f: BLS
> >> > rctl 85 - BLS reject received
> >> >  6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid  14c: BLS
> >> > rctl 85 - BLS reject received
> >> >  6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid   8a: BLS
> >> > rctl 85 - BLS reject received
> >> >  6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid  181: BLS
> >> > rctl 85 - BLS reject received
> >> >  6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid  16b: BLS
> >> > rctl 85 - BLS reject received
> >> >  6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid   cc: BLS
> >> > rctl 85 - BLS reject received
> >> >  6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid  12c:
> >> > Exchange timed out
> >> >  6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid  e65:
> >> > Exchange timer armed : 2000 msecs
> >> >  6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid  e65:
> >> > Exchange timer canceled
> >> >  6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid  12c:
> >> > LS_RJT for RRQ
> >> >  6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi:
> >> > Resetting rport (00061e)
> >> >  6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun
> >> > reset to lun 0 completed
> >> >
> >> > and
> >> >
> >> >  7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e:
> >> > xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000
> >> > t_blen=1000
> >> >  7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid  6ae:
> >> > f_ctl  90000 seq  1
> >> >  7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e:
> >> > xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000
> >> > t_blen=1000
> >> >  7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid  68e:
> >> > f_ctl  90000 seq  1
> >> >  7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e:
> >> > xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000
> >> > t_blen=1000
> >> >  7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid  c2e:
> >> > f_ctl  90000 seq  1
> >> >  7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid  1c4: BLS
> >> > rctl 85 - BLS reject received
> >> >  7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid   8c: BLS
> >> > rctl 85 - BLS reject received
> >> >  7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid  6ae:
> >> > exch: BLS rctl 84 - BLS accept
> >> >  7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
> >> >  7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid  6ae:
> >> > Exchange timer armed : 10000 msecs
> >> >  7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid   ef: BLS
> >> > rctl 85 - BLS reject received
> >> >  7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e:
> >> > xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid  1cf:
> >> > f_ctl  90000 seq  1
> >> >  7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid  1cf: BLS
> >> > rctl 85 - BLS reject received
> >> >  7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e:
> >> > xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid   c4:
> >> > f_ctl  90000 seq  1
> >> >  7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid   c4: BLS
> >> > rctl 85 - BLS reject received
> >> >  7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid   e6: BLS
> >> > rctl 85 - BLS reject received
> >> >  7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid   87: BLS
> >> > rctl 85 - BLS reject received
> >> >  7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid  107: BLS
> >> > rctl 85 - BLS reject received
> >> >  7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e:
> >> > xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid   42:
> >> > f_ctl  90000 seq  1
> >> >  7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid   42: BLS
> >> > rctl 85 - BLS reject received
> >> >  7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid    7: BLS
> >> > rctl 85 - BLS reject received
> >> >  7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid  1a9:
> >> > exch: BLS rctl 84 - BLS accept
> >> >  7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
> >> >  7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid  10d: BLS
> >> > rctl 85 - BLS reject received
> >> >  7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid  18e: BLS
> >> > rctl 85 - BLS reject received
> >> >  7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e:
> >> > xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR
> >> >  7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid  1e4:
> >> > f_ctl  90000 seq  1
> >> >  7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid  1e4: BLS
> >> > rctl 85 - BLS reject received
> >> >  7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid   8d: BLS
> >> > rctl 85 - BLS reject received
> >> >  7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e:
> >> > xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000
> >> > t_blen=1000
> >> >  7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid  c64:
> >> > f_ctl  90000 seq  1
> >> >  7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid  6ae:
> >> > Exchange timed out
> >> >  7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid  e65:
> >> > Exchange timer armed : 2000 msecs
> >> >  7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid  e65:
> >> > Exchange timed out
> >> >  7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid  e65:
> >> > f_ctl  90000 seq  1
> >> >  7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid  e65:
> >> > Exchange timer armed : 20000 msecs
> >> >  7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e:
> >> > Remove port
> >> >  7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e:
> >> > Port sending LOGO from Ready state
> >> >
> >> > and finally
> >> >
> >> >  7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e:
> >> > Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
> >> >  7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid  c44:
> >> > f_ctl  90000 seq  1
> >> >  7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e:
> >> > Received a LOGO response closed
> >> >  7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid  e65:
> >> > Exchange timer canceled
> >> >  7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e:
> >> > work delete
> >> >  7729 May 27 10:46:31 poc1 kernel: [ 1863.357830]  rport-7:0-0:
> >> > blocked FC remote port time out: removing target and saving binding
> >> >
> >> >
> >> > The messages on the target side are similar as before.
> >> > This is when Exhange timer message first appears:
> >> > 18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid  c48:
> >> > f_ctl 800000 seq  1
> >> > 18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid  c48:
> >> > f_ctl 880008 seq  2
> >> > 18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid  766:
> >> > f_ctl 800000 seq  1
> >> > 18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid  766:
> >> > f_ctl 880008 seq  2
> >> > 18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid  cce:
> >> > f_ctl 800000 seq  1
> >> > 18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid  cce:
> >> > f_ctl 880008 seq  2
> >> > 18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid  2c7:
> >> > f_ctl 800000 seq  1
> >> > 18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid  2c7:
> >> > f_ctl 880008 seq  2
> >> > 18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid  f09:
> >> > f_ctl 800000 seq  1
> >> > 18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid  f09:
> >> > f_ctl 880008 seq  2
> >> > 18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid  ce0:
> >> > f_ctl 800000 seq  1
> >> > 18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid  ce0:
> >> > f_ctl 880008 seq  2
> >> > 18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid  8eb:
> >> > f_ctl 800000 seq  1
> >> > 18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid  32f:
> >> > f_ctl 800000 seq  1
> >> > 18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid  8eb:
> >> > f_ctl 880008 seq  2
> >> > 18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid  32f:
> >> > f_ctl 880008 seq  2
> >> > 18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid  98d:
> >> > f_ctl 800000 seq  1
> >> > 18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid  98d:
> >> > f_ctl 880008 seq  2
> >> > 18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid  c01:
> >> > f_ctl 800000 seq  1
> >> > 18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid  c01:
> >> > f_ctl 880008 seq  2
> >> > 18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid  6ea:
> >> > f_ctl 800000 seq  1
> >> > 18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid  6ea:
> >> > f_ctl 880008 seq  2
> >> > 18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid  766:
> >> > f_ctl 800000 seq  1
> >> > 18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid  766:
> >> > f_ctl 880008 seq  2
> >> > 18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid  30c:
> >> > f_ctl 800000 seq  1
> >> > 18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid  30c:
> >> > f_ctl 880008 seq  2
> >> > 18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
> >> > Exchange timer armed : 0 msecs
> >> > 18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid  c48:
> >> > f_ctl 800000 seq  1
> >> > 18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
> >> > Exchange timed out
> >> >
> >> > This is when ft_queue_data_in failure first appears:
> >> > 18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid  327:
> >> > f_ctl 800000 seq  1
> >> > 18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid  327:
> >> > f_ctl 880008 seq  2
> >> > 18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid  d68:
> >> > f_ctl 800000 seq  1
> >> > 18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid  d68:
> >> > f_ctl 880008 seq  2
> >> > 18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid  f89:
> >> > f_ctl 800000 seq  1
> >> > 18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid  f89:
> >> > f_ctl 880008 seq  2
> >> > 18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid  74a:
> >> > f_ctl 800000 seq  1
> >> > 18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid  74a:
> >> > f_ctl 880008 seq  2
> >> > 18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid  40c:
> >> > f_ctl 800000 seq  1
> >> > 18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid  40c:
> >> > f_ctl 880008 seq  2
> >> > 18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid  92b:
> >> > f_ctl 800000 seq  1
> >> > 18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid  92b:
> >> > f_ctl 880008 seq  2
> >> > 18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid  dce:
> >> > f_ctl 800000 seq  1
> >> > 18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid  dce:
> >> > f_ctl 880008 seq  2
> >> > 18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid  42f:
> >> > f_ctl 800000 seq  1
> >> > 18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid  42f:
> >> > f_ctl 880008 seq  2
> >> > 18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid  90d:
> >> > f_ctl 800000 seq  1
> >> > 18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid  90d:
> >> > f_ctl 880008 seq  2
> >> > 18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid  d60:
> >> > f_ctl 800000 seq  1
> >> > 18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid  d60:
> >> > f_ctl 880008 seq  2
> >> > 18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid  ac4:
> >> > f_ctl 800000 seq  1
> >> > 18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid  ac4:
> >> > f_ctl 880008 seq  2
> >> > 18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid  fc2:
> >> > f_ctl 800000 seq  1
> >> > 18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid  fc2:
> >> > f_ctl 880008 seq  2
> >> > 18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid  885:
> >> > f_ctl 800000 seq  1
> >> > 18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid  885:
> >> > f_ctl 880008 seq  2
> >> > 18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid  ce1:
> >> > f_ctl 800000 seq  1
> >> > 18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid  ce1:
> >> > f_ctl 880008 seq  2
> >> > 18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid  7a3:
> >> > f_ctl 800000 seq  1
> >> > 18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid  7a3:
> >> > f_ctl 880008 seq  2
> >> > 18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid  7e6:
> >> > f_ctl 800000 seq  1
> >> > 18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid  7e6:
> >> > f_ctl 880008 seq  2
> >> > 18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid  327:
> >> > f_ctl 800000 seq  1
> >> > 18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
> >> > Exchange timer armed : 0 msecs
> >> > 18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid  327:
> >> > f_ctl 800008 seq  2
> >> > 18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
> >> > lso_max <0x10000>
> >> > 18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
> >> > lso_max <0x10000>
> >> > 18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
> >> > lso_max <0x10000>
> >> >
> >> > After stripping out the "host7: xid  xxx: f_ctl 8x0000 seq  x" lines,
> >> > the messages on the target look like:
> >> > 15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too
> >> > long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to
> >> > 50000
> >> > 18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
> >> > Exchange timer armed : 0 msecs
> >> > 18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
> >> > Exchange timed out
> >> > 18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid  766:
> >> > Exchange timer armed : 0 msecs
> >> > 18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid  766:
> >> > Exchange timed out
> >> > 18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid  cce:
> >> > Exchange timer armed : 0 msecs
> >> > 18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid  cce:
> >> > Exchange timed out
> >> > 18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid  a84:
> >> > Exchange timer armed : 0 msecs
> >> > 18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid  a84:
> >> > Exchange timed out
> >> > 18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid  d00:
> >> > Exchange timer armed : 0 msecs
> >> > 18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid  d00:
> >> > Exchange timed out
> >> > 18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid  287:
> >> > Exchange timer armed : 0 msecs
> >> > 18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid  287:
> >> > Exchange timed out
> >> > 18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid  a64:
> >> > Exchange timer armed : 0 msecs
> >> > 18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid  a64:
> >> > Exchange timed out
> >> > 18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid  be8:
> >> > Exchange timer armed : 0 msecs
> >> > 18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid  be8:
> >> > Exchange timed out
> >> > 18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid  ea9:
> >> > Exchange timer armed : 0 msecs
> >> > 18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid  ea9:
> >> > Exchange timed out
> >> > 18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid  c01:
> >> > Exchange timer armed : 0 msecs
> >> > 18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid  c01:
> >> > Exchange timed out
> >> > 18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid  30c:
> >> > Exchange timer armed : 0 msecs
> >> > 18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid  30c:
> >> > Exchange timed out
> >> > 18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid  c8e:
> >> > Exchange timer armed : 0 msecs
> >> > 18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid  c8e:
> >> > Exchange timed out
> >> > 18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid  9ad:
> >> > Exchange timer armed : 0 msecs
> >> > 18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid  9ad:
> >> > Exchange timed out
> >> > 18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid  ce0:
> >> > Exchange timer armed : 0 msecs
> >> > 18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid  ce0:
> >> > Exchange timed out
> >> > 18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid  746:
> >> > Exchange timer armed : 0 msecs
> >> > 18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid  746:
> >> > Exchange timed out
> >> > 18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid  26c:
> >> > Exchange timer armed : 0 msecs
> >> > 18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid  26c:
> >> > Exchange timed out
> >> > 18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid  f09:
> >> > Exchange timer armed : 0 msecs
> >> > 18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid  f09:
> >> > Exchange timed out
> >> > 18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid  d2e:
> >> > Exchange timer armed : 0 msecs
> >> > 18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid  d2e:
> >> > Exchange timed out
> >> > 18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid  32f:
> >> > Exchange timer armed : 0 msecs
> >> > 18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid  32f:
> >> > Exchange timed out
> >> > 18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid  c68:
> >> > Exchange timer armed : 0 msecs
> >> > 18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid  c68:
> >> > Exchange timed out
> >> > 18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid  267:
> >> > Exchange timer armed : 0 msecs
> >> > 18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid  267:
> >> > Exchange timed out
> >> > 18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid  3cf:
> >> > Exchange timer armed : 0 msecs
> >> > 18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid  3cf:
> >> > Exchange timed out
> >> > 18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid  984:
> >> > Exchange timer armed : 0 msecs
> >> > 18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid  984:
> >> > Exchange timed out
> >> > 18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid  7a6:
> >> > Exchange timer armed : 0 msecs
> >> > 18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid  7a6:
> >> > Exchange timed out
> >> > 18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid  2c7:
> >> > Exchange timer armed : 0 msecs
> >> > 18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid  2c7:
> >> > Exchange timed out
> >> > 18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid  307:
> >> > Exchange timer armed : 0 msecs
> >> > 18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid  307:
> >> > Exchange timed out
> >> > 18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid  44f:
> >> > Exchange timer armed : 0 msecs
> >> > 18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid  44f:
> >> > Exchange timed out
> >> > 18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid  d01:
> >> > Exchange timer armed : 0 msecs
> >> > 18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid  d01:
> >> > Exchange timed out
> >> > 18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid  94b:
> >> > Exchange timer armed : 0 msecs
> >> > 18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid  94b:
> >> > Exchange timed out
> >> > 18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid  743:
> >> > Exchange timer armed : 0 msecs
> >> > 18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid  743:
> >> > Exchange timed out
> >> > 18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid  f82:
> >> > Exchange timer armed : 0 msecs
> >> > 18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid  f82:
> >> > Exchange timed out
> >> > 18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid  e00:
> >> > Exchange timer armed : 0 msecs
> >> > 18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid  e00:
> >> > Exchange timed out
> >> > 18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid  d08:
> >> > Exchange timer armed : 0 msecs
> >> > 18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid  d08:
> >> > Exchange timed out
> >> > 18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid  8c5:
> >> > Exchange timer armed : 0 msecs
> >> > 18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid  8c5:
> >> > Exchange timed out
> >> > 18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid  92d:
> >> > Exchange timer armed : 0 msecs
> >> > 18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid  92d:
> >> > Exchange timed out
> >> > 18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid  a04:
> >> > Exchange timer armed : 0 msecs
> >> > 18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid  a04:
> >> > Exchange timed out
> >> > 18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid  d28:
> >> > Exchange timer armed : 0 msecs
> >> > 18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid  d28:
> >> > Exchange timed out
> >> > 18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid  8ed:
> >> > Exchange timer armed : 0 msecs
> >> > 18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid  8ed:
> >> > Exchange timed out
> >> > 18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid  3ef:
> >> > Exchange timer armed : 0 msecs
> >> > 18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid  3ef:
> >> > Exchange timed out
> >> > 18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid  fa9:
> >> > Exchange timer armed : 0 msecs
> >> > 18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid  fa9:
> >> > Exchange timed out
> >> > 18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid  38c:
> >> > Exchange timer armed : 0 msecs
> >> > 18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid  38c:
> >> > Exchange timed out
> >> > 18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid  d48:
> >> > Exchange timer armed : 0 msecs
> >> > 18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid  d48:
> >> > Exchange timed out
> >> > 18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid  aa4:
> >> > Exchange timer armed : 0 msecs
> >> > 18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid  aa4:
> >> > Exchange timed out
> >> > 18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid  f22:
> >> > Exchange timer armed : 0 msecs
> >> > 18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid  f22:
> >> > Exchange timed out
> >> > 18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid  da0:
> >> > Exchange timer armed : 0 msecs
> >> > 18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid  da0:
> >> > Exchange timed out
> >> > 18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid  3cc:
> >> > Exchange timer armed : 0 msecs
> >> > 18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid  3cc:
> >> > Exchange timed out
> >> > 18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid  e0e:
> >> > Exchange timer armed : 0 msecs
> >> > 18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid  e0e:
> >> > Exchange timed out
> >> > 18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid  2e7:
> >> > Exchange timer armed : 0 msecs
> >> > 18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid  2e7:
> >> > Exchange timed out
> >> > 18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid  826:
> >> > Exchange timer armed : 0 msecs
> >> > 18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid  826:
> >> > Exchange timed out
> >> > 18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid  40f:
> >> > Exchange timer armed : 0 msecs
> >> > 18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid  40f:
> >> > Exchange timed out
> >> > 18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
> >> > Exchange timer armed : 0 msecs
> >> > 18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
> >> > lso_max <0x10000>
> >> > 18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
> >> > lso_max <0x10000>
> >> > 18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
> >> > Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
> >> > lso_max <0x10000>
> >> > 18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid  327:
> >> > Exchange timed out
> >> > 18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid  ac4:
> >> > Exchange timer armed : 2000 msecs
> >> > 18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid  ac4:
> >> > Exchange timed out
> >> > 18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid  ac4:
> >> > f_ctl  90000 seq  1
> >> > 18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid  ac4:
> >> > Exchange timer armed : 20000 msecs
> >> >
> >> > Thanks
> >> >
> >> >
> >> > On Fri, May 23, 2014 at 9:23 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> >> >> On Wed, 2014-05-21 at 15:23 -0700, Jun Wu wrote:
> >> >>> > 18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid
> >> >>> 6e4:
> >> >>> > Exchange timer armed : 0 msecs
> >> >>> > 18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid
> >> >>> 6e4:
> >> >>> > f_ctl 800000 seq  1
> >> >>> > 18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid
> >> >>> 6e4:
> >> >>> > f_ctl 800000 seq  2
> >> >>> > 18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid
> >> >>> 6e4:
> >> >>> > Exchange timed out
> >> >>> > 18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777]
> >> >>> ft_queue_data_in:
> >> >>> > Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining
> >> >>> 458752,
> >> >>> > lso_max <0x10000>
> >> >>
> >> >> Exchange 0x6e4 is aborted and then target still sending frame, while
> >> >> later should not occur but first setting up abort with 0 msec timeout
> >> >> doesn not look correct either and it is different that 8000 ms on
> >> >> initiator side. Are you using same switch between initiator and target
> >> >> with DCB enabled ?
> >> >>
> >> >> Reducing retries could narrow down to early aborts is the cause here,
> >> >> can you try with REC disabled on initiator side for that using this
> >> >> change ?
> >> >>
> >> >> diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
> >> >> index 1d7e76e..43fe793 100644
> >> >> --- a/drivers/scsi/libfc/fc_fcp.c
> >> >> +++ b/drivers/scsi/libfc/fc_fcp.c
> >> >> @@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport,
> >> >> struct fc_fcp_pkt *fsp,
> >> >>         fc_fcp_pkt_hold(fsp);   /* hold for fc_fcp_pkt_destroy */
> >> >>
> >> >>         setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp);
> >> >> +       rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED;
> >> >>         if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED)
> >> >>                 fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp));
> >> >>
> >> >>
> >
> >


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux