MTU were 1500 for both initiator and target. I used "ethtool -K p4p1 tso off" to turn off tcp segmentation offload on all machines. Register setting after the command is shown below. [root@poc3 jkong]# ethtool -k p4p1 Features for p4p1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: off tx-tcp-segmentation: off tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: off udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: on [fixed] tx-nocache-copy: on loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off Info on NIC drivers [root@poc3 jkong]# ethtool -i p4p1 driver: ixgbe version: 3.15.1-k firmware-version: 0x80000208 bus-info: 0000:08:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no After the change, I repeated the same test and got similar failure on target side: [12253.032595] ft_queue_data_in: Failed to send frame ffff88062a638600, xid <0xa0c>, remaining 458752, lso_max <0x10000> [12253.032605] ft_queue_data_in: Failed to send frame ffff88062a638600, xid <0xa0c>, remaining 393216, lso_max <0x10000> [12253.032609] ft_queue_data_in: Failed to send frame ffff88062a638600, xid <0xa0c>, remaining 327680, lso_max <0x10000> [12253.032613] ft_queue_data_in: Failed to send frame ffff88062a638600, xid <0xa0c>, remaining 262144, lso_max <0x10000> [12284.299877] ft_queue_data_in: Failed to send frame ffff8803202ec600, xid <0x3a2>, remaining 196608, lso_max <0x10000> [12284.299885] ft_queue_data_in: Failed to send frame ffff8803202ec600, xid <0x3a2>, remaining 131072, lso_max <0x10000> [12284.299889] ft_queue_data_in: Failed to send frame ffff8803202ec600, xid <0x3a2>, remaining 65536, lso_max <0x10000> [12284.299892] ft_queue_data_in: Failed to send frame ffff8803202ec600, xid <0x3a2>, remaining 0, lso_max <0x10000> [12284.451810] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 458752, lso_max <0x10000> [12284.451818] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 393216, lso_max <0x10000> [12284.451824] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 327680, lso_max <0x10000> [12284.451827] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 262144, lso_max <0x10000> [12284.451831] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 196608, lso_max <0x10000> [12284.451834] ft_queue_data_in: Failed to send frame ffff88061deb1400, xid <0xecf>, remaining 131072, lso_max <0x10000> [12347.503478] ft_queue_data_in: 2 callbacks suppressed [12347.503486] ft_queue_data_in: Failed to send frame ffff8806142bc800, xid <0xb4f>, remaining 458752, lso_max <0x10000> [12347.503492] ft_queue_data_in: Failed to send frame ffff8806142bc800, xid <0xb4f>, remaining 393216, lso_max <0x10000> [12347.503496] ft_queue_data_in: Failed to send frame ffff8806142bc800, xid <0xb4f>, remaining 327680, lso_max <0x10000> [12347.503517] ft_queue_data_in: Failed to send frame ffff8806142bc800, xid <0xb4f>, remaining 262144, lso_max <0x10000> [12378.402412] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 458752, lso_max <0x10000> [12378.402420] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 393216, lso_max <0x10000> [12378.402425] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 327680, lso_max <0x10000> [12378.402428] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 262144, lso_max <0x10000> [12378.402432] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 196608, lso_max <0x10000> [12378.402436] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 131072, lso_max <0x10000> [12378.402440] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 65536, lso_max <0x10000> [12378.402444] ft_queue_data_in: Failed to send frame ffff88062ddeac00, xid <0x6a5>, remaining 0, lso_max <0x10000> [13049.224513] ft_queue_data_in: Failed to send frame ffff880614588c00, xid <0xd2f>, remaining 196608, lso_max <0x10000> [13049.224524] ft_queue_data_in: Failed to send frame ffff880614588c00, xid <0xd2f>, remaining 131072, lso_max <0x10000> [13049.224528] ft_queue_data_in: Failed to send frame ffff880614588c00, xid <0xd2f>, remaining 65536, lso_max <0x10000> [13049.224532] ft_queue_data_in: Failed to send frame ffff880614588c00, xid <0xd2f>, remaining 0, lso_max <0x10000> [13052.511306] ft_queue_data_in: Failed to send frame ffff88062d49f000, xid <0x8ae>, remaining 196608, lso_max <0x10000> [13052.511313] ft_queue_data_in: Failed to send frame ffff88062d49f000, xid <0x8ae>, remaining 131072, lso_max <0x10000> [13052.511317] ft_queue_data_in: Failed to send frame ffff88062d49f000, xid <0x8ae>, remaining 65536, lso_max <0x10000> [13052.511321] ft_queue_data_in: Failed to send frame ffff88062d49f000, xid <0x8ae>, remaining 0, lso_max <0x10000> [13087.976748] ft_queue_data_in: Failed to send frame ffff88031afc9c00, xid <0x96b>, remaining 458752, lso_max <0x10000> [13087.998453] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 458752, lso_max <0x10000> [13087.998459] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 393216, lso_max <0x10000> [13087.998463] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 327680, lso_max <0x10000> [13087.998467] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 262144, lso_max <0x10000> [13087.998470] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 196608, lso_max <0x10000> [13087.998474] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 131072, lso_max <0x10000> [13087.998478] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 65536, lso_max <0x10000> [13087.998482] ft_queue_data_in: Failed to send frame ffff88032c881200, xid <0xb23>, remaining 0, lso_max <0x10000> [13119.177286] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 458752, lso_max <0x10000> [13119.177297] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 393216, lso_max <0x10000> [13119.177302] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 327680, lso_max <0x10000> [13119.177307] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 262144, lso_max <0x10000> [13119.177311] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 196608, lso_max <0x10000> [13119.177316] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 131072, lso_max <0x10000> [13119.177321] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 65536, lso_max <0x10000> [13119.177325] ft_queue_data_in: Failed to send frame ffff88062dff7400, xid <0xfcf>, remaining 0, lso_max <0x10000> [13122.335322] ------------[ cut here ]------------ [13122.335336] WARNING: CPU: 6 PID: 2165 at include/scsi/fc_frame.h:173 fcoe_percpu_receive_thread+0x507/0x53c [fcoe]() [13122.335338] Modules linked in: async_memcpy async_xor xor async_tx fcoe libfcoe tcm_fc libfc scsi_transport_fc scsi_tgt target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod 8021q garp mrp bridge stp llc iTCO_wdt gpio_ich iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode serio_raw i2c_i801 lpc_ich mfd_core ses enclosure i7core_edac ioatdma edac_core shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc radeon drm_kms_helper ttm drm ixgbe igb ata_generic mdio pata_acpi ptp pata_jmicron pps_core i2c_algo_bit aacraid dca i2c_core [last unloaded: vd] [13122.335390] CPU: 6 PID: 2165 Comm: fcoethread/6 Tainted: GF O 3.13.10-200.zbfcoepatch.fc20.x86_64 #1 [13122.335392] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c 10/28/2011 [13122.335394] 0000000000000009 ffff88062b04bdd0 ffffffff81687eac 0000000000000000 [13122.335400] ffff88062b04be08 ffffffff8106d4dd ffffe8ffffc41748 ffff88062a444700 [13122.335404] ffff8800b7e926e8 0000000000000002 ffff88062b04be88 ffff88062b04be18 [13122.335408] Call Trace: [13122.335419] [<ffffffff81687eac>] dump_stack+0x45/0x56 [13122.335426] [<ffffffff8106d4dd>] warn_slowpath_common+0x7d/0xa0 [13122.335430] [<ffffffff8106d5ba>] warn_slowpath_null+0x1a/0x20 [13122.335435] [<ffffffffa0651517>] fcoe_percpu_receive_thread+0x507/0x53c [fcoe] [13122.335440] [<ffffffffa0651010>] ? fcoe_set_port_id+0x50/0x50 [fcoe] [13122.335446] [<ffffffff8108f2f2>] kthread+0xd2/0xf0 [13122.335450] [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40 [13122.335458] [<ffffffff81696dbc>] ret_from_fork+0x7c/0xb0 [13122.335461] [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40 [13122.335464] ---[ end trace e4509e1053f499ac ]--- Thanks, Jun On Tue, May 20, 2014 at 11:03 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Mon, 2014-05-19 at 17:29 -0700, Jun Wu wrote: >> Hi Nicholas, >> >> We downloaded the source of our running kernel (3.13.10-200) and >> applied your percpu-ida pre-allocation regression fix, then compiled >> and installed the kernel. I repeated the same test three times, >> running 10 fio sessions to 10 drives on the target through fcoe vn2vn. >> In the first two tests, the target machine hung with the following >> messages: >> >> 15231 May 19 11:49:27 poc1 kernel: [ 1073.783229] ft_queue_data_in: >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 196608, >> lso_max <0x10000> >> 15232 May 19 11:49:27 poc1 kernel: [ 1073.783238] ft_queue_data_in: >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 131072, >> lso_max <0x10000> >> 15233 May 19 11:49:27 poc1 kernel: [ 1073.783242] ft_queue_data_in: >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 65536, >> lso_max <0x10000> >> 15234 May 19 11:49:27 poc1 kernel: [ 1073.783245] ft_queue_data_in: >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 0, >> lso_max <0x10000> >> 15235 May 19 11:49:30 poc1 kernel: [ 1076.907061] ft_queue_data_in: >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 196608, >> lso_max <0x10000> >> 15236 May 19 11:49:30 poc1 kernel: [ 1076.907068] ft_queue_data_in: >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 131072, >> lso_max <0x10000> >> 15237 May 19 11:49:30 poc1 kernel: [ 1076.907073] ft_queue_data_in: >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 65536, >> lso_max <0x10000> >> 15238 May 19 11:49:30 poc1 kernel: [ 1076.907077] ft_queue_data_in: >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 0, >> lso_max <0x10000> >> 15239 May 19 11:50:01 poc1 kernel: [ 1107.918910] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 458752, >> lso_max <0x10000> >> 15240 May 19 11:50:01 poc1 kernel: [ 1107.918918] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 393216, >> lso_max <0x10000> >> 15241 May 19 11:50:01 poc1 kernel: [ 1107.918922] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 327680, >> lso_max <0x10000> >> 15242 May 19 11:50:01 poc1 kernel: [ 1107.918925] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 262144, >> lso_max <0x10000> >> 15243 May 19 11:50:01 poc1 kernel: [ 1107.918929] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 196608, >> lso_max <0x10000> >> 15244 May 19 11:50:01 poc1 kernel: [ 1107.918932] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 131072, >> lso_max <0x10000> >> 15245 May 19 11:50:01 poc1 kernel: [ 1107.918936] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 65536, >> lso_max <0x10000> >> 15246 May 19 11:50:01 poc1 kernel: [ 1107.918939] ft_queue_data_in: >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 0, >> lso_max <0x10000> >> 15247 May 19 11:50:05 poc1 kernel: [ 1111.450900] ft_queue_data_in: >> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 196608, >> lso_max <0x10000> >> 15248 May 19 11:50:05 poc1 kernel: [ 1111.450908] ft_queue_data_in: >> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 131072, >> lso_max <0x10000> >> 15249 May 19 11:51:12 poc1 kernel: [ 1178.698434] ft_queue_data_in: 6 >> callbacks suppressed >> 15250 May 19 11:51:12 poc1 kernel: [ 1178.698440] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 458752, >> lso_max <0x10000> >> 15251 May 19 11:51:12 poc1 kernel: [ 1178.698446] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 393216, >> lso_max <0x10000> >> 15252 May 19 11:51:12 poc1 kernel: [ 1178.698449] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 327680, >> lso_max <0x10000> >> 15253 May 19 11:51:12 poc1 kernel: [ 1178.698453] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 262144, >> lso_max <0x10000> >> 15254 May 19 11:51:12 poc1 kernel: [ 1178.698456] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 196608, >> lso_max <0x10000> >> 15255 May 19 11:51:12 poc1 kernel: [ 1178.698460] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 131072, >> lso_max <0x10000> >> 15256 May 19 11:51:12 poc1 kernel: [ 1178.698463] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 65536, >> lso_max <0x10000> >> 15257 May 19 11:51:12 poc1 kernel: [ 1178.698467] ft_queue_data_in: >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 0, >> lso_max <0x10000> >> > > The call into lport->tt.seq_send() libfc code is failing to send > outgoing solicited data-in. From the output, note the LSO (large > segment offload aka TCP segment offload) feature has been enabled by the > underlying NIC hardware. > > So in order to isolate possible issues, I'd recommend: > > - Disabling hardware offloads on both initiator and target sides (LRO + > LSO) using ethtool -K > - Disabling any jumbo frames settings on either side > > Is there any other non standard network and/or switch settings that are > in place..? Also, please confirm what your NIC + switch setup looks > like. > > Rob & Open-FCoE folks, is there anything else to take into consideration > here..? > >> >> I didn't see the previous message "unable to handle kernel NULL >> pointer dereference at 0000000000000048". So it must have been fixed >> by your change. >> > > Thanks for confirming that bit. > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html