Re: [Open-FCoE] System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We do not have DCB PFC so the traffic is managed by PAUSE frames by default.
The kernel has been recompiled to include your change. I ran the same
test after the kernel change, i.e. running 10 fio sessions from
initiator to 10 drives on target. The target machine hung after a few
minutes.

On the initiator side the messages look like:
 4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e:
xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR
 4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid  1cd:
f_ctl  90000 seq  1
 4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid  1cd: BLS
rctl 85 - BLS reject received
 4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e:
xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR
 4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid  1ee:
f_ctl  90000 seq  1
 4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)

I see much less Exchange timer messages than last time. Here are all
the instances of them:

 6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e:
xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR
 6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid  181:
f_ctl  90000 seq  1
 6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e:
xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR
 6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid    5:
f_ctl  90000 seq  1
 6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e:
xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR
 6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid   60:
f_ctl  90000 seq  1
 6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid  12c:
exch: BLS rctl 84 - BLS accept
 6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
 6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid  12c:
Exchange timer armed : 10000 msecs
 6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e:
xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR
 6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid   cc:
f_ctl  90000 seq  1
 6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid   60: BLS
rctl 85 - BLS reject received
 6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid  14d: BLS
rctl 85 - BLS reject received
 6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid  18b: BLS
rctl 85 - BLS reject received
 6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid   ed: BLS
rctl 85 - BLS reject received
 6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid   21: BLS
rctl 85 - BLS reject received
 6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid  122: BLS
rctl 85 - BLS reject received
 6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid   45: BLS
rctl 85 - BLS reject received
 6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid  16f: BLS
rctl 85 - BLS reject received
 6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid  14c: BLS
rctl 85 - BLS reject received
 6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid   8a: BLS
rctl 85 - BLS reject received
 6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid  181: BLS
rctl 85 - BLS reject received
 6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid  16b: BLS
rctl 85 - BLS reject received
 6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid   cc: BLS
rctl 85 - BLS reject received
 6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid  12c:
Exchange timed out
 6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid  e65:
Exchange timer armed : 2000 msecs
 6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid  e65:
Exchange timer canceled
 6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid  12c:
LS_RJT for RRQ
 6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi:
Resetting rport (00061e)
 6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun
reset to lun 0 completed

and

 7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e:
xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000
t_blen=1000
 7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid  6ae:
f_ctl  90000 seq  1
 7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e:
xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000
t_blen=1000
 7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid  68e:
f_ctl  90000 seq  1
 7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e:
xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000
t_blen=1000
 7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid  c2e:
f_ctl  90000 seq  1
 7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid  1c4: BLS
rctl 85 - BLS reject received
 7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid   8c: BLS
rctl 85 - BLS reject received
 7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid  6ae:
exch: BLS rctl 84 - BLS accept
 7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
 7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid  6ae:
Exchange timer armed : 10000 msecs
 7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid   ef: BLS
rctl 85 - BLS reject received
 7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e:
xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR
 7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid  1cf:
f_ctl  90000 seq  1
 7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid  1cf: BLS
rctl 85 - BLS reject received
 7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e:
xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR
 7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid   c4:
f_ctl  90000 seq  1
 7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid   c4: BLS
rctl 85 - BLS reject received
 7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid   e6: BLS
rctl 85 - BLS reject received
 7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid   87: BLS
rctl 85 - BLS reject received
 7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid  107: BLS
rctl 85 - BLS reject received
 7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e:
xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR
 7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid   42:
f_ctl  90000 seq  1
 7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid   42: BLS
rctl 85 - BLS reject received
 7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid    7: BLS
rctl 85 - BLS reject received
 7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid  1a9:
exch: BLS rctl 84 - BLS accept
 7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
 7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid  10d: BLS
rctl 85 - BLS reject received
 7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid  18e: BLS
rctl 85 - BLS reject received
 7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e:
xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR
 7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid  1e4:
f_ctl  90000 seq  1
 7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid  1e4: BLS
rctl 85 - BLS reject received
 7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid   8d: BLS
rctl 85 - BLS reject received
 7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e:
xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000
t_blen=1000
 7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid  c64:
f_ctl  90000 seq  1
 7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid  6ae:
Exchange timed out
 7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid  e65:
Exchange timer armed : 2000 msecs
 7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid  e65:
Exchange timed out
 7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid  e65:
f_ctl  90000 seq  1
 7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid  e65:
Exchange timer armed : 20000 msecs
 7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e:
Remove port
 7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e:
Port sending LOGO from Ready state

and finally

 7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
 7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid  c44:
f_ctl  90000 seq  1
 7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e:
Received a LOGO response closed
 7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid  e65:
Exchange timer canceled
 7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e:
work delete
 7729 May 27 10:46:31 poc1 kernel: [ 1863.357830]  rport-7:0-0:
blocked FC remote port time out: removing target and saving binding


The messages on the target side are similar as before.
This is when Exhange timer message first appears:
18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid  c48:
f_ctl 800000 seq  1
18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid  c48:
f_ctl 880008 seq  2
18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid  766:
f_ctl 800000 seq  1
18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid  766:
f_ctl 880008 seq  2
18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid  cce:
f_ctl 800000 seq  1
18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid  cce:
f_ctl 880008 seq  2
18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid  2c7:
f_ctl 800000 seq  1
18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid  2c7:
f_ctl 880008 seq  2
18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid  f09:
f_ctl 800000 seq  1
18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid  f09:
f_ctl 880008 seq  2
18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid  ce0:
f_ctl 800000 seq  1
18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid  ce0:
f_ctl 880008 seq  2
18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid  8eb:
f_ctl 800000 seq  1
18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid  32f:
f_ctl 800000 seq  1
18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid  8eb:
f_ctl 880008 seq  2
18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid  32f:
f_ctl 880008 seq  2
18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid  98d:
f_ctl 800000 seq  1
18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid  98d:
f_ctl 880008 seq  2
18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid  c01:
f_ctl 800000 seq  1
18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid  c01:
f_ctl 880008 seq  2
18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid  6ea:
f_ctl 800000 seq  1
18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid  6ea:
f_ctl 880008 seq  2
18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid  766:
f_ctl 800000 seq  1
18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid  766:
f_ctl 880008 seq  2
18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid  30c:
f_ctl 800000 seq  1
18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid  30c:
f_ctl 880008 seq  2
18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
Exchange timer armed : 0 msecs
18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid  c48:
f_ctl 800000 seq  1
18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
Exchange timed out

This is when ft_queue_data_in failure first appears:
18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid  327:
f_ctl 800000 seq  1
18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid  327:
f_ctl 880008 seq  2
18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid  d68:
f_ctl 800000 seq  1
18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid  d68:
f_ctl 880008 seq  2
18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid  f89:
f_ctl 800000 seq  1
18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid  f89:
f_ctl 880008 seq  2
18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid  74a:
f_ctl 800000 seq  1
18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid  74a:
f_ctl 880008 seq  2
18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid  40c:
f_ctl 800000 seq  1
18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid  40c:
f_ctl 880008 seq  2
18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid  92b:
f_ctl 800000 seq  1
18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid  92b:
f_ctl 880008 seq  2
18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid  dce:
f_ctl 800000 seq  1
18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid  dce:
f_ctl 880008 seq  2
18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid  42f:
f_ctl 800000 seq  1
18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid  42f:
f_ctl 880008 seq  2
18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid  90d:
f_ctl 800000 seq  1
18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid  90d:
f_ctl 880008 seq  2
18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid  d60:
f_ctl 800000 seq  1
18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid  d60:
f_ctl 880008 seq  2
18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid  ac4:
f_ctl 800000 seq  1
18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid  ac4:
f_ctl 880008 seq  2
18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid  fc2:
f_ctl 800000 seq  1
18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid  fc2:
f_ctl 880008 seq  2
18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid  885:
f_ctl 800000 seq  1
18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid  885:
f_ctl 880008 seq  2
18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid  ce1:
f_ctl 800000 seq  1
18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid  ce1:
f_ctl 880008 seq  2
18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid  7a3:
f_ctl 800000 seq  1
18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid  7a3:
f_ctl 880008 seq  2
18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid  7e6:
f_ctl 800000 seq  1
18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid  7e6:
f_ctl 880008 seq  2
18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid  327:
f_ctl 800000 seq  1
18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
Exchange timer armed : 0 msecs
18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid  327:
f_ctl 800008 seq  2
18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
lso_max <0x10000>
18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
lso_max <0x10000>
18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
lso_max <0x10000>

After stripping out the "host7: xid  xxx: f_ctl 8x0000 seq  x" lines,
the messages on the target look like:
15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too
long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to
50000
18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid  c48:
Exchange timer armed : 0 msecs
18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid  c48:
Exchange timed out
18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid  766:
Exchange timer armed : 0 msecs
18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid  766:
Exchange timed out
18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid  cce:
Exchange timer armed : 0 msecs
18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid  cce:
Exchange timed out
18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid  a84:
Exchange timer armed : 0 msecs
18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid  a84:
Exchange timed out
18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid  d00:
Exchange timer armed : 0 msecs
18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid  d00:
Exchange timed out
18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid  287:
Exchange timer armed : 0 msecs
18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid  287:
Exchange timed out
18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid  a64:
Exchange timer armed : 0 msecs
18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid  a64:
Exchange timed out
18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid  be8:
Exchange timer armed : 0 msecs
18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid  be8:
Exchange timed out
18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid  ea9:
Exchange timer armed : 0 msecs
18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid  ea9:
Exchange timed out
18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid  c01:
Exchange timer armed : 0 msecs
18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid  c01:
Exchange timed out
18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid  30c:
Exchange timer armed : 0 msecs
18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid  30c:
Exchange timed out
18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid  c8e:
Exchange timer armed : 0 msecs
18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid  c8e:
Exchange timed out
18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid  9ad:
Exchange timer armed : 0 msecs
18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid  9ad:
Exchange timed out
18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid  ce0:
Exchange timer armed : 0 msecs
18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid  ce0:
Exchange timed out
18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid  746:
Exchange timer armed : 0 msecs
18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid  746:
Exchange timed out
18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid  26c:
Exchange timer armed : 0 msecs
18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid  26c:
Exchange timed out
18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid  f09:
Exchange timer armed : 0 msecs
18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid  f09:
Exchange timed out
18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid  d2e:
Exchange timer armed : 0 msecs
18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid  d2e:
Exchange timed out
18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid  32f:
Exchange timer armed : 0 msecs
18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid  32f:
Exchange timed out
18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid  c68:
Exchange timer armed : 0 msecs
18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid  c68:
Exchange timed out
18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid  267:
Exchange timer armed : 0 msecs
18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid  267:
Exchange timed out
18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid  3cf:
Exchange timer armed : 0 msecs
18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid  3cf:
Exchange timed out
18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid  984:
Exchange timer armed : 0 msecs
18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid  984:
Exchange timed out
18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid  7a6:
Exchange timer armed : 0 msecs
18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid  7a6:
Exchange timed out
18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid  2c7:
Exchange timer armed : 0 msecs
18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid  2c7:
Exchange timed out
18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid  307:
Exchange timer armed : 0 msecs
18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid  307:
Exchange timed out
18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid  44f:
Exchange timer armed : 0 msecs
18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid  44f:
Exchange timed out
18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid  d01:
Exchange timer armed : 0 msecs
18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid  d01:
Exchange timed out
18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid  94b:
Exchange timer armed : 0 msecs
18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid  94b:
Exchange timed out
18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid  743:
Exchange timer armed : 0 msecs
18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid  743:
Exchange timed out
18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid  f82:
Exchange timer armed : 0 msecs
18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid  f82:
Exchange timed out
18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid  e00:
Exchange timer armed : 0 msecs
18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid  e00:
Exchange timed out
18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid  d08:
Exchange timer armed : 0 msecs
18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid  d08:
Exchange timed out
18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid  8c5:
Exchange timer armed : 0 msecs
18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid  8c5:
Exchange timed out
18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid  92d:
Exchange timer armed : 0 msecs
18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid  92d:
Exchange timed out
18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid  a04:
Exchange timer armed : 0 msecs
18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid  a04:
Exchange timed out
18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid  d28:
Exchange timer armed : 0 msecs
18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid  d28:
Exchange timed out
18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid  8ed:
Exchange timer armed : 0 msecs
18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid  8ed:
Exchange timed out
18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid  3ef:
Exchange timer armed : 0 msecs
18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid  3ef:
Exchange timed out
18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid  fa9:
Exchange timer armed : 0 msecs
18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid  fa9:
Exchange timed out
18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid  38c:
Exchange timer armed : 0 msecs
18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid  38c:
Exchange timed out
18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid  d48:
Exchange timer armed : 0 msecs
18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid  d48:
Exchange timed out
18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid  aa4:
Exchange timer armed : 0 msecs
18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid  aa4:
Exchange timed out
18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid  f22:
Exchange timer armed : 0 msecs
18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid  f22:
Exchange timed out
18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid  da0:
Exchange timer armed : 0 msecs
18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid  da0:
Exchange timed out
18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid  3cc:
Exchange timer armed : 0 msecs
18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid  3cc:
Exchange timed out
18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid  e0e:
Exchange timer armed : 0 msecs
18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid  e0e:
Exchange timed out
18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid  2e7:
Exchange timer armed : 0 msecs
18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid  2e7:
Exchange timed out
18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid  826:
Exchange timer armed : 0 msecs
18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid  826:
Exchange timed out
18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid  40f:
Exchange timer armed : 0 msecs
18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid  40f:
Exchange timed out
18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid  327:
Exchange timer armed : 0 msecs
18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
lso_max <0x10000>
18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
lso_max <0x10000>
18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
lso_max <0x10000>
18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid  327:
Exchange timed out
18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid  ac4:
Exchange timer armed : 2000 msecs
18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid  ac4:
Exchange timed out
18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid  ac4:
f_ctl  90000 seq  1
18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid  ac4:
Exchange timer armed : 20000 msecs

Thanks


On Fri, May 23, 2014 at 9:23 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> On Wed, 2014-05-21 at 15:23 -0700, Jun Wu wrote:
>> > 18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid
>> 6e4:
>> > Exchange timer armed : 0 msecs
>> > 18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid
>> 6e4:
>> > f_ctl 800000 seq  1
>> > 18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid
>> 6e4:
>> > f_ctl 800000 seq  2
>> > 18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid
>> 6e4:
>> > Exchange timed out
>> > 18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777]
>> ft_queue_data_in:
>> > Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining
>> 458752,
>> > lso_max <0x10000>
>
> Exchange 0x6e4 is aborted and then target still sending frame, while
> later should not occur but first setting up abort with 0 msec timeout
> doesn not look correct either and it is different that 8000 ms on
> initiator side. Are you using same switch between initiator and target
> with DCB enabled ?
>
> Reducing retries could narrow down to early aborts is the cause here,
> can you try with REC disabled on initiator side for that using this
> change ?
>
> diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
> index 1d7e76e..43fe793 100644
> --- a/drivers/scsi/libfc/fc_fcp.c
> +++ b/drivers/scsi/libfc/fc_fcp.c
> @@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport,
> struct fc_fcp_pkt *fsp,
>         fc_fcp_pkt_hold(fsp);   /* hold for fc_fcp_pkt_destroy */
>
>         setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp);
> +       rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED;
>         if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED)
>                 fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp));
>
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux