On Tue, 2014-05-20 at 22:29 -0700, Jun Wu wrote: > MTU were 1500 for both initiator and target. > I used "ethtool -K p4p1 tso off" to turn off tcp segmentation offload > on all machines. Register setting after the command is shown below. > > [root@poc3 jkong]# ethtool -k p4p1 > Features for p4p1: > rx-checksumming: on > tx-checksumming: on > tx-checksum-ipv4: on > tx-checksum-ip-generic: off [fixed] > tx-checksum-ipv6: on > tx-checksum-fcoe-crc: on [fixed] > tx-checksum-sctp: on > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: off [fixed] > tcp-segmentation-offload: off > tx-tcp-segmentation: off > tx-tcp-ecn-segmentation: off [fixed] > tx-tcp6-segmentation: off > udp-fragmentation-offload: off [fixed] > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off > rx-vlan-offload: on > tx-vlan-offload: on > ntuple-filters: off > receive-hashing: on > highdma: on [fixed] > rx-vlan-filter: on > vlan-challenged: off [fixed] > tx-lockless: off [fixed] > netns-local: off [fixed] > tx-gso-robust: off [fixed] > tx-fcoe-segmentation: on [fixed] > tx-gre-segmentation: off [fixed] > tx-ipip-segmentation: off [fixed] > tx-sit-segmentation: off [fixed] > tx-udp_tnl-segmentation: off [fixed] > tx-mpls-segmentation: off [fixed] > fcoe-mtu: on [fixed] > tx-nocache-copy: on > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off > tx-vlan-stag-hw-insert: off [fixed] > rx-vlan-stag-hw-parse: off [fixed] > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off > > Info on NIC drivers > > [root@poc3 jkong]# ethtool -i p4p1 > driver: ixgbe > version: 3.15.1-k > firmware-version: 0x80000208 > bus-info: 0000:08:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: no > > After the change, I repeated the same test and got similar failure on > target side: > > [12253.032595] ft_queue_data_in: Failed to send frame > ffff88062a638600, xid <0xa0c>, remaining 458752, lso_max <0x10000> It is send frame failure and to find out what caused send failure more debug info in low level fcoe Tx path functions will be helpful, it can be done by:- # echo 0xFF > /sys/module/libfc/parameters/debug_logging # echo 0x1 > /sys/module/fcoe/parameters/debug_logging Disabling Tx offload may not help here and instead would slow down Tx, so have them restored. Also, are you using switch between hosts and target ? In any case you would need DCB PFC or PAUSE enabled to avoid excessive Tx retries though that should not cause send failure. //Vasu > [12253.032605] ft_queue_data_in: Failed to send frame > ffff88062a638600, xid <0xa0c>, remaining 393216, lso_max <0x10000> > [12253.032609] ft_queue_data_in: Failed to send frame > ffff88062a638600, xid <0xa0c>, remaining 327680, lso_max <0x10000> > [12253.032613] ft_queue_data_in: Failed to send frame > ffff88062a638600, xid <0xa0c>, remaining 262144, lso_max <0x10000> > [12284.299877] ft_queue_data_in: Failed to send frame > ffff8803202ec600, xid <0x3a2>, remaining 196608, lso_max <0x10000> > [12284.299885] ft_queue_data_in: Failed to send frame > ffff8803202ec600, xid <0x3a2>, remaining 131072, lso_max <0x10000> > [12284.299889] ft_queue_data_in: Failed to send frame > ffff8803202ec600, xid <0x3a2>, remaining 65536, lso_max <0x10000> > [12284.299892] ft_queue_data_in: Failed to send frame > ffff8803202ec600, xid <0x3a2>, remaining 0, lso_max <0x10000> > [12284.451810] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 458752, lso_max <0x10000> > [12284.451818] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 393216, lso_max <0x10000> > [12284.451824] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 327680, lso_max <0x10000> > [12284.451827] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 262144, lso_max <0x10000> > [12284.451831] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 196608, lso_max <0x10000> > [12284.451834] ft_queue_data_in: Failed to send frame > ffff88061deb1400, xid <0xecf>, remaining 131072, lso_max <0x10000> > [12347.503478] ft_queue_data_in: 2 callbacks suppressed > [12347.503486] ft_queue_data_in: Failed to send frame > ffff8806142bc800, xid <0xb4f>, remaining 458752, lso_max <0x10000> > [12347.503492] ft_queue_data_in: Failed to send frame > ffff8806142bc800, xid <0xb4f>, remaining 393216, lso_max <0x10000> > [12347.503496] ft_queue_data_in: Failed to send frame > ffff8806142bc800, xid <0xb4f>, remaining 327680, lso_max <0x10000> > [12347.503517] ft_queue_data_in: Failed to send frame > ffff8806142bc800, xid <0xb4f>, remaining 262144, lso_max <0x10000> > [12378.402412] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 458752, lso_max <0x10000> > [12378.402420] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 393216, lso_max <0x10000> > [12378.402425] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 327680, lso_max <0x10000> > [12378.402428] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 262144, lso_max <0x10000> > [12378.402432] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 196608, lso_max <0x10000> > [12378.402436] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 131072, lso_max <0x10000> > [12378.402440] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 65536, lso_max <0x10000> > [12378.402444] ft_queue_data_in: Failed to send frame > ffff88062ddeac00, xid <0x6a5>, remaining 0, lso_max <0x10000> > [13049.224513] ft_queue_data_in: Failed to send frame > ffff880614588c00, xid <0xd2f>, remaining 196608, lso_max <0x10000> > [13049.224524] ft_queue_data_in: Failed to send frame > ffff880614588c00, xid <0xd2f>, remaining 131072, lso_max <0x10000> > [13049.224528] ft_queue_data_in: Failed to send frame > ffff880614588c00, xid <0xd2f>, remaining 65536, lso_max <0x10000> > [13049.224532] ft_queue_data_in: Failed to send frame > ffff880614588c00, xid <0xd2f>, remaining 0, lso_max <0x10000> > [13052.511306] ft_queue_data_in: Failed to send frame > ffff88062d49f000, xid <0x8ae>, remaining 196608, lso_max <0x10000> > [13052.511313] ft_queue_data_in: Failed to send frame > ffff88062d49f000, xid <0x8ae>, remaining 131072, lso_max <0x10000> > [13052.511317] ft_queue_data_in: Failed to send frame > ffff88062d49f000, xid <0x8ae>, remaining 65536, lso_max <0x10000> > [13052.511321] ft_queue_data_in: Failed to send frame > ffff88062d49f000, xid <0x8ae>, remaining 0, lso_max <0x10000> > [13087.976748] ft_queue_data_in: Failed to send frame > ffff88031afc9c00, xid <0x96b>, remaining 458752, lso_max <0x10000> > [13087.998453] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 458752, lso_max <0x10000> > [13087.998459] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 393216, lso_max <0x10000> > [13087.998463] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 327680, lso_max <0x10000> > [13087.998467] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 262144, lso_max <0x10000> > [13087.998470] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 196608, lso_max <0x10000> > [13087.998474] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 131072, lso_max <0x10000> > [13087.998478] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 65536, lso_max <0x10000> > [13087.998482] ft_queue_data_in: Failed to send frame > ffff88032c881200, xid <0xb23>, remaining 0, lso_max <0x10000> > [13119.177286] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 458752, lso_max <0x10000> > [13119.177297] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 393216, lso_max <0x10000> > [13119.177302] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 327680, lso_max <0x10000> > [13119.177307] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 262144, lso_max <0x10000> > [13119.177311] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 196608, lso_max <0x10000> > [13119.177316] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 131072, lso_max <0x10000> > [13119.177321] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 65536, lso_max <0x10000> > [13119.177325] ft_queue_data_in: Failed to send frame > ffff88062dff7400, xid <0xfcf>, remaining 0, lso_max <0x10000> > [13122.335322] ------------[ cut here ]------------ > [13122.335336] WARNING: CPU: 6 PID: 2165 at > include/scsi/fc_frame.h:173 fcoe_percpu_receive_thread+0x507/0x53c > [fcoe]() > [13122.335338] Modules linked in: async_memcpy async_xor xor async_tx > fcoe libfcoe tcm_fc libfc scsi_transport_fc scsi_tgt target_core_pscsi > target_core_file target_core_iblock iscsi_target_mod target_core_mod > 8021q garp mrp bridge stp llc iTCO_wdt gpio_ich iTCO_vendor_support > coretemp kvm_intel kvm crc32c_intel microcode serio_raw i2c_i801 > lpc_ich mfd_core ses enclosure i7core_edac ioatdma edac_core shpchp > acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc radeon > drm_kms_helper ttm drm ixgbe igb ata_generic mdio pata_acpi ptp > pata_jmicron pps_core i2c_algo_bit aacraid dca i2c_core [last > unloaded: vd] > [13122.335390] CPU: 6 PID: 2165 Comm: fcoethread/6 Tainted: GF > O 3.13.10-200.zbfcoepatch.fc20.x86_64 #1 > [13122.335392] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c 10/28/2011 > [13122.335394] 0000000000000009 ffff88062b04bdd0 ffffffff81687eac > 0000000000000000 > [13122.335400] ffff88062b04be08 ffffffff8106d4dd ffffe8ffffc41748 > ffff88062a444700 > [13122.335404] ffff8800b7e926e8 0000000000000002 ffff88062b04be88 > ffff88062b04be18 > [13122.335408] Call Trace: > [13122.335419] [<ffffffff81687eac>] dump_stack+0x45/0x56 > [13122.335426] [<ffffffff8106d4dd>] warn_slowpath_common+0x7d/0xa0 > [13122.335430] [<ffffffff8106d5ba>] warn_slowpath_null+0x1a/0x20 > [13122.335435] [<ffffffffa0651517>] > fcoe_percpu_receive_thread+0x507/0x53c [fcoe] > [13122.335440] [<ffffffffa0651010>] ? fcoe_set_port_id+0x50/0x50 [fcoe] > [13122.335446] [<ffffffff8108f2f2>] kthread+0xd2/0xf0 > [13122.335450] [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40 > [13122.335458] [<ffffffff81696dbc>] ret_from_fork+0x7c/0xb0 > [13122.335461] [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40 > [13122.335464] ---[ end trace e4509e1053f499ac ]--- > > Thanks, > > Jun > > On Tue, May 20, 2014 at 11:03 AM, Nicholas A. Bellinger > <nab@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2014-05-19 at 17:29 -0700, Jun Wu wrote: > >> Hi Nicholas, > >> > >> We downloaded the source of our running kernel (3.13.10-200) and > >> applied your percpu-ida pre-allocation regression fix, then compiled > >> and installed the kernel. I repeated the same test three times, > >> running 10 fio sessions to 10 drives on the target through fcoe vn2vn. > >> In the first two tests, the target machine hung with the following > >> messages: > >> > >> 15231 May 19 11:49:27 poc1 kernel: [ 1073.783229] ft_queue_data_in: > >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 196608, > >> lso_max <0x10000> > >> 15232 May 19 11:49:27 poc1 kernel: [ 1073.783238] ft_queue_data_in: > >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 131072, > >> lso_max <0x10000> > >> 15233 May 19 11:49:27 poc1 kernel: [ 1073.783242] ft_queue_data_in: > >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 65536, > >> lso_max <0x10000> > >> 15234 May 19 11:49:27 poc1 kernel: [ 1073.783245] ft_queue_data_in: > >> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 0, > >> lso_max <0x10000> > >> 15235 May 19 11:49:30 poc1 kernel: [ 1076.907061] ft_queue_data_in: > >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 196608, > >> lso_max <0x10000> > >> 15236 May 19 11:49:30 poc1 kernel: [ 1076.907068] ft_queue_data_in: > >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 131072, > >> lso_max <0x10000> > >> 15237 May 19 11:49:30 poc1 kernel: [ 1076.907073] ft_queue_data_in: > >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 65536, > >> lso_max <0x10000> > >> 15238 May 19 11:49:30 poc1 kernel: [ 1076.907077] ft_queue_data_in: > >> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 0, > >> lso_max <0x10000> > >> 15239 May 19 11:50:01 poc1 kernel: [ 1107.918910] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 458752, > >> lso_max <0x10000> > >> 15240 May 19 11:50:01 poc1 kernel: [ 1107.918918] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 393216, > >> lso_max <0x10000> > >> 15241 May 19 11:50:01 poc1 kernel: [ 1107.918922] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 327680, > >> lso_max <0x10000> > >> 15242 May 19 11:50:01 poc1 kernel: [ 1107.918925] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 262144, > >> lso_max <0x10000> > >> 15243 May 19 11:50:01 poc1 kernel: [ 1107.918929] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 196608, > >> lso_max <0x10000> > >> 15244 May 19 11:50:01 poc1 kernel: [ 1107.918932] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 131072, > >> lso_max <0x10000> > >> 15245 May 19 11:50:01 poc1 kernel: [ 1107.918936] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 65536, > >> lso_max <0x10000> > >> 15246 May 19 11:50:01 poc1 kernel: [ 1107.918939] ft_queue_data_in: > >> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 0, > >> lso_max <0x10000> > >> 15247 May 19 11:50:05 poc1 kernel: [ 1111.450900] ft_queue_data_in: > >> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 196608, > >> lso_max <0x10000> > >> 15248 May 19 11:50:05 poc1 kernel: [ 1111.450908] ft_queue_data_in: > >> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 131072, > >> lso_max <0x10000> > >> 15249 May 19 11:51:12 poc1 kernel: [ 1178.698434] ft_queue_data_in: 6 > >> callbacks suppressed > >> 15250 May 19 11:51:12 poc1 kernel: [ 1178.698440] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 458752, > >> lso_max <0x10000> > >> 15251 May 19 11:51:12 poc1 kernel: [ 1178.698446] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 393216, > >> lso_max <0x10000> > >> 15252 May 19 11:51:12 poc1 kernel: [ 1178.698449] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 327680, > >> lso_max <0x10000> > >> 15253 May 19 11:51:12 poc1 kernel: [ 1178.698453] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 262144, > >> lso_max <0x10000> > >> 15254 May 19 11:51:12 poc1 kernel: [ 1178.698456] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 196608, > >> lso_max <0x10000> > >> 15255 May 19 11:51:12 poc1 kernel: [ 1178.698460] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 131072, > >> lso_max <0x10000> > >> 15256 May 19 11:51:12 poc1 kernel: [ 1178.698463] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 65536, > >> lso_max <0x10000> > >> 15257 May 19 11:51:12 poc1 kernel: [ 1178.698467] ft_queue_data_in: > >> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 0, > >> lso_max <0x10000> > >> > > > > The call into lport->tt.seq_send() libfc code is failing to send > > outgoing solicited data-in. From the output, note the LSO (large > > segment offload aka TCP segment offload) feature has been enabled by the > > underlying NIC hardware. > > > > So in order to isolate possible issues, I'd recommend: > > > > - Disabling hardware offloads on both initiator and target sides (LRO + > > LSO) using ethtool -K > > - Disabling any jumbo frames settings on either side > > > > Is there any other non standard network and/or switch settings that are > > in place..? Also, please confirm what your NIC + switch setup looks > > like. > > > > Rob & Open-FCoE folks, is there anything else to take into consideration > > here..? > > > >> > >> I didn't see the previous message "unable to handle kernel NULL > >> pointer dereference at 0000000000000048". So it must have been fixed > >> by your change. > >> > > > > Thanks for confirming that bit. > > > > --nab > > > _______________________________________________ > fcoe-devel mailing list > fcoe-devel@xxxxxxxxxxxxx > http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html