Re: System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



MTU were 1500 for both initiator and target.
I used "ethtool -K p4p1 tso off" to turn off tcp segmentation offload
on all machines. Register setting after the command is shown below.

[root@poc3 jkong]# ethtool -k p4p1
Features for p4p1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: on [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: on [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off

Info on NIC drivers

[root@poc3 jkong]# ethtool -i p4p1
driver: ixgbe
version: 3.15.1-k
firmware-version: 0x80000208
bus-info: 0000:08:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

After the change, I repeated the same test and got similar failure on
target side:

[12253.032595] ft_queue_data_in: Failed to send frame
ffff88062a638600, xid <0xa0c>, remaining 458752, lso_max <0x10000>
[12253.032605] ft_queue_data_in: Failed to send frame
ffff88062a638600, xid <0xa0c>, remaining 393216, lso_max <0x10000>
[12253.032609] ft_queue_data_in: Failed to send frame
ffff88062a638600, xid <0xa0c>, remaining 327680, lso_max <0x10000>
[12253.032613] ft_queue_data_in: Failed to send frame
ffff88062a638600, xid <0xa0c>, remaining 262144, lso_max <0x10000>
[12284.299877] ft_queue_data_in: Failed to send frame
ffff8803202ec600, xid <0x3a2>, remaining 196608, lso_max <0x10000>
[12284.299885] ft_queue_data_in: Failed to send frame
ffff8803202ec600, xid <0x3a2>, remaining 131072, lso_max <0x10000>
[12284.299889] ft_queue_data_in: Failed to send frame
ffff8803202ec600, xid <0x3a2>, remaining 65536, lso_max <0x10000>
[12284.299892] ft_queue_data_in: Failed to send frame
ffff8803202ec600, xid <0x3a2>, remaining 0, lso_max <0x10000>
[12284.451810] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 458752, lso_max <0x10000>
[12284.451818] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 393216, lso_max <0x10000>
[12284.451824] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 327680, lso_max <0x10000>
[12284.451827] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 262144, lso_max <0x10000>
[12284.451831] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 196608, lso_max <0x10000>
[12284.451834] ft_queue_data_in: Failed to send frame
ffff88061deb1400, xid <0xecf>, remaining 131072, lso_max <0x10000>
[12347.503478] ft_queue_data_in: 2 callbacks suppressed
[12347.503486] ft_queue_data_in: Failed to send frame
ffff8806142bc800, xid <0xb4f>, remaining 458752, lso_max <0x10000>
[12347.503492] ft_queue_data_in: Failed to send frame
ffff8806142bc800, xid <0xb4f>, remaining 393216, lso_max <0x10000>
[12347.503496] ft_queue_data_in: Failed to send frame
ffff8806142bc800, xid <0xb4f>, remaining 327680, lso_max <0x10000>
[12347.503517] ft_queue_data_in: Failed to send frame
ffff8806142bc800, xid <0xb4f>, remaining 262144, lso_max <0x10000>
[12378.402412] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 458752, lso_max <0x10000>
[12378.402420] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 393216, lso_max <0x10000>
[12378.402425] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 327680, lso_max <0x10000>
[12378.402428] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 262144, lso_max <0x10000>
[12378.402432] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 196608, lso_max <0x10000>
[12378.402436] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 131072, lso_max <0x10000>
[12378.402440] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 65536, lso_max <0x10000>
[12378.402444] ft_queue_data_in: Failed to send frame
ffff88062ddeac00, xid <0x6a5>, remaining 0, lso_max <0x10000>
[13049.224513] ft_queue_data_in: Failed to send frame
ffff880614588c00, xid <0xd2f>, remaining 196608, lso_max <0x10000>
[13049.224524] ft_queue_data_in: Failed to send frame
ffff880614588c00, xid <0xd2f>, remaining 131072, lso_max <0x10000>
[13049.224528] ft_queue_data_in: Failed to send frame
ffff880614588c00, xid <0xd2f>, remaining 65536, lso_max <0x10000>
[13049.224532] ft_queue_data_in: Failed to send frame
ffff880614588c00, xid <0xd2f>, remaining 0, lso_max <0x10000>
[13052.511306] ft_queue_data_in: Failed to send frame
ffff88062d49f000, xid <0x8ae>, remaining 196608, lso_max <0x10000>
[13052.511313] ft_queue_data_in: Failed to send frame
ffff88062d49f000, xid <0x8ae>, remaining 131072, lso_max <0x10000>
[13052.511317] ft_queue_data_in: Failed to send frame
ffff88062d49f000, xid <0x8ae>, remaining 65536, lso_max <0x10000>
[13052.511321] ft_queue_data_in: Failed to send frame
ffff88062d49f000, xid <0x8ae>, remaining 0, lso_max <0x10000>
[13087.976748] ft_queue_data_in: Failed to send frame
ffff88031afc9c00, xid <0x96b>, remaining 458752, lso_max <0x10000>
[13087.998453] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 458752, lso_max <0x10000>
[13087.998459] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 393216, lso_max <0x10000>
[13087.998463] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 327680, lso_max <0x10000>
[13087.998467] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 262144, lso_max <0x10000>
[13087.998470] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 196608, lso_max <0x10000>
[13087.998474] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 131072, lso_max <0x10000>
[13087.998478] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 65536, lso_max <0x10000>
[13087.998482] ft_queue_data_in: Failed to send frame
ffff88032c881200, xid <0xb23>, remaining 0, lso_max <0x10000>
[13119.177286] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 458752, lso_max <0x10000>
[13119.177297] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 393216, lso_max <0x10000>
[13119.177302] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 327680, lso_max <0x10000>
[13119.177307] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 262144, lso_max <0x10000>
[13119.177311] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 196608, lso_max <0x10000>
[13119.177316] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 131072, lso_max <0x10000>
[13119.177321] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 65536, lso_max <0x10000>
[13119.177325] ft_queue_data_in: Failed to send frame
ffff88062dff7400, xid <0xfcf>, remaining 0, lso_max <0x10000>
[13122.335322] ------------[ cut here ]------------
[13122.335336] WARNING: CPU: 6 PID: 2165 at
include/scsi/fc_frame.h:173 fcoe_percpu_receive_thread+0x507/0x53c
[fcoe]()
[13122.335338] Modules linked in: async_memcpy async_xor xor async_tx
fcoe libfcoe tcm_fc libfc scsi_transport_fc scsi_tgt target_core_pscsi
target_core_file target_core_iblock iscsi_target_mod target_core_mod
8021q garp mrp bridge stp llc iTCO_wdt gpio_ich iTCO_vendor_support
coretemp kvm_intel kvm crc32c_intel microcode serio_raw i2c_i801
lpc_ich mfd_core ses enclosure i7core_edac ioatdma edac_core shpchp
acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc radeon
drm_kms_helper ttm drm ixgbe igb ata_generic mdio pata_acpi ptp
pata_jmicron pps_core i2c_algo_bit aacraid dca i2c_core [last
unloaded: vd]
[13122.335390] CPU: 6 PID: 2165 Comm: fcoethread/6 Tainted: GF
 O 3.13.10-200.zbfcoepatch.fc20.x86_64 #1
[13122.335392] Hardware name: Supermicro X8DTN/X8DTN, BIOS 2.1c       10/28/2011
[13122.335394]  0000000000000009 ffff88062b04bdd0 ffffffff81687eac
0000000000000000
[13122.335400]  ffff88062b04be08 ffffffff8106d4dd ffffe8ffffc41748
ffff88062a444700
[13122.335404]  ffff8800b7e926e8 0000000000000002 ffff88062b04be88
ffff88062b04be18
[13122.335408] Call Trace:
[13122.335419]  [<ffffffff81687eac>] dump_stack+0x45/0x56
[13122.335426]  [<ffffffff8106d4dd>] warn_slowpath_common+0x7d/0xa0
[13122.335430]  [<ffffffff8106d5ba>] warn_slowpath_null+0x1a/0x20
[13122.335435]  [<ffffffffa0651517>]
fcoe_percpu_receive_thread+0x507/0x53c [fcoe]
[13122.335440]  [<ffffffffa0651010>] ? fcoe_set_port_id+0x50/0x50 [fcoe]
[13122.335446]  [<ffffffff8108f2f2>] kthread+0xd2/0xf0
[13122.335450]  [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
[13122.335458]  [<ffffffff81696dbc>] ret_from_fork+0x7c/0xb0
[13122.335461]  [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
[13122.335464] ---[ end trace e4509e1053f499ac ]---

Thanks,

Jun

On Tue, May 20, 2014 at 11:03 AM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Mon, 2014-05-19 at 17:29 -0700, Jun Wu wrote:
>> Hi Nicholas,
>>
>> We downloaded the source of our running kernel (3.13.10-200) and
>> applied your percpu-ida pre-allocation regression fix, then compiled
>> and installed the kernel. I repeated the same test three times,
>> running 10 fio sessions to 10 drives on the target through fcoe vn2vn.
>> In the first two tests, the target machine hung with the following
>> messages:
>>
>> 15231 May 19 11:49:27 poc1 kernel: [ 1073.783229] ft_queue_data_in:
>> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 196608,
>> lso_max <0x10000>
>> 15232 May 19 11:49:27 poc1 kernel: [ 1073.783238] ft_queue_data_in:
>> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 131072,
>> lso_max <0x10000>
>> 15233 May 19 11:49:27 poc1 kernel: [ 1073.783242] ft_queue_data_in:
>> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 65536,
>> lso_max <0x10000>
>> 15234 May 19 11:49:27 poc1 kernel: [ 1073.783245] ft_queue_data_in:
>> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 0,
>> lso_max <0x10000>
>> 15235 May 19 11:49:30 poc1 kernel: [ 1076.907061] ft_queue_data_in:
>> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 196608,
>> lso_max <0x10000>
>> 15236 May 19 11:49:30 poc1 kernel: [ 1076.907068] ft_queue_data_in:
>> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 131072,
>> lso_max <0x10000>
>> 15237 May 19 11:49:30 poc1 kernel: [ 1076.907073] ft_queue_data_in:
>> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 65536,
>> lso_max <0x10000>
>> 15238 May 19 11:49:30 poc1 kernel: [ 1076.907077] ft_queue_data_in:
>> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 0,
>> lso_max <0x10000>
>> 15239 May 19 11:50:01 poc1 kernel: [ 1107.918910] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 458752,
>> lso_max <0x10000>
>> 15240 May 19 11:50:01 poc1 kernel: [ 1107.918918] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 393216,
>> lso_max <0x10000>
>> 15241 May 19 11:50:01 poc1 kernel: [ 1107.918922] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 327680,
>> lso_max <0x10000>
>> 15242 May 19 11:50:01 poc1 kernel: [ 1107.918925] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 262144,
>> lso_max <0x10000>
>> 15243 May 19 11:50:01 poc1 kernel: [ 1107.918929] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 196608,
>> lso_max <0x10000>
>> 15244 May 19 11:50:01 poc1 kernel: [ 1107.918932] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 131072,
>> lso_max <0x10000>
>> 15245 May 19 11:50:01 poc1 kernel: [ 1107.918936] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 65536,
>> lso_max <0x10000>
>> 15246 May 19 11:50:01 poc1 kernel: [ 1107.918939] ft_queue_data_in:
>> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 0,
>> lso_max <0x10000>
>> 15247 May 19 11:50:05 poc1 kernel: [ 1111.450900] ft_queue_data_in:
>> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 196608,
>> lso_max <0x10000>
>> 15248 May 19 11:50:05 poc1 kernel: [ 1111.450908] ft_queue_data_in:
>> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 131072,
>> lso_max <0x10000>
>> 15249 May 19 11:51:12 poc1 kernel: [ 1178.698434] ft_queue_data_in: 6
>> callbacks suppressed
>> 15250 May 19 11:51:12 poc1 kernel: [ 1178.698440] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 458752,
>> lso_max <0x10000>
>> 15251 May 19 11:51:12 poc1 kernel: [ 1178.698446] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 393216,
>> lso_max <0x10000>
>> 15252 May 19 11:51:12 poc1 kernel: [ 1178.698449] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 327680,
>> lso_max <0x10000>
>> 15253 May 19 11:51:12 poc1 kernel: [ 1178.698453] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 262144,
>> lso_max <0x10000>
>> 15254 May 19 11:51:12 poc1 kernel: [ 1178.698456] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 196608,
>> lso_max <0x10000>
>> 15255 May 19 11:51:12 poc1 kernel: [ 1178.698460] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 131072,
>> lso_max <0x10000>
>> 15256 May 19 11:51:12 poc1 kernel: [ 1178.698463] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 65536,
>> lso_max <0x10000>
>> 15257 May 19 11:51:12 poc1 kernel: [ 1178.698467] ft_queue_data_in:
>> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 0,
>> lso_max <0x10000>
>>
>
> The call into lport->tt.seq_send() libfc code is failing to send
> outgoing solicited data-in.  From the output, note the LSO (large
> segment offload aka TCP segment offload) feature has been enabled by the
> underlying NIC hardware.
>
> So in order to isolate possible issues, I'd recommend:
>
> - Disabling hardware offloads on both initiator and target sides (LRO +
>   LSO) using ethtool -K
> - Disabling any jumbo frames settings on either side
>
> Is there any other non standard network and/or switch settings that are
> in place..?  Also, please confirm what your NIC + switch setup looks
> like.
>
> Rob & Open-FCoE folks, is there anything else to take into consideration
> here..?
>
>>
>> I didn't see the previous message "unable to handle kernel NULL
>> pointer dereference at 0000000000000048". So it must have been fixed
>> by your change.
>>
>
> Thanks for confirming that bit.
>
> --nab
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux