Re: System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2014-05-19 at 17:29 -0700, Jun Wu wrote:
> Hi Nicholas,
> 
> We downloaded the source of our running kernel (3.13.10-200) and
> applied your percpu-ida pre-allocation regression fix, then compiled
> and installed the kernel. I repeated the same test three times,
> running 10 fio sessions to 10 drives on the target through fcoe vn2vn.
> In the first two tests, the target machine hung with the following
> messages:
> 
> 15231 May 19 11:49:27 poc1 kernel: [ 1073.783229] ft_queue_data_in:
> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 196608,
> lso_max <0x10000>
> 15232 May 19 11:49:27 poc1 kernel: [ 1073.783238] ft_queue_data_in:
> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 131072,
> lso_max <0x10000>
> 15233 May 19 11:49:27 poc1 kernel: [ 1073.783242] ft_queue_data_in:
> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 65536,
> lso_max <0x10000>
> 15234 May 19 11:49:27 poc1 kernel: [ 1073.783245] ft_queue_data_in:
> Failed to send frame ffff880c0b188200, xid <0x2a5>, remaining 0,
> lso_max <0x10000>
> 15235 May 19 11:49:30 poc1 kernel: [ 1076.907061] ft_queue_data_in:
> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 196608,
> lso_max <0x10000>
> 15236 May 19 11:49:30 poc1 kernel: [ 1076.907068] ft_queue_data_in:
> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 131072,
> lso_max <0x10000>
> 15237 May 19 11:49:30 poc1 kernel: [ 1076.907073] ft_queue_data_in:
> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 65536,
> lso_max <0x10000>
> 15238 May 19 11:49:30 poc1 kernel: [ 1076.907077] ft_queue_data_in:
> Failed to send frame ffff880c1d1df000, xid <0x305>, remaining 0,
> lso_max <0x10000>
> 15239 May 19 11:50:01 poc1 kernel: [ 1107.918910] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 458752,
> lso_max <0x10000>
> 15240 May 19 11:50:01 poc1 kernel: [ 1107.918918] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 393216,
> lso_max <0x10000>
> 15241 May 19 11:50:01 poc1 kernel: [ 1107.918922] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 327680,
> lso_max <0x10000>
> 15242 May 19 11:50:01 poc1 kernel: [ 1107.918925] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 262144,
> lso_max <0x10000>
> 15243 May 19 11:50:01 poc1 kernel: [ 1107.918929] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 196608,
> lso_max <0x10000>
> 15244 May 19 11:50:01 poc1 kernel: [ 1107.918932] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 131072,
> lso_max <0x10000>
> 15245 May 19 11:50:01 poc1 kernel: [ 1107.918936] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 65536,
> lso_max <0x10000>
> 15246 May 19 11:50:01 poc1 kernel: [ 1107.918939] ft_queue_data_in:
> Failed to send frame ffff88060cd40800, xid <0x3cb>, remaining 0,
> lso_max <0x10000>
> 15247 May 19 11:50:05 poc1 kernel: [ 1111.450900] ft_queue_data_in:
> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 196608,
> lso_max <0x10000>
> 15248 May 19 11:50:05 poc1 kernel: [ 1111.450908] ft_queue_data_in:
> Failed to send frame ffff880c0b24ca00, xid <0xea6>, remaining 131072,
> lso_max <0x10000>
> 15249 May 19 11:51:12 poc1 kernel: [ 1178.698434] ft_queue_data_in: 6
> callbacks suppressed
> 15250 May 19 11:51:12 poc1 kernel: [ 1178.698440] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 458752,
> lso_max <0x10000>
> 15251 May 19 11:51:12 poc1 kernel: [ 1178.698446] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 393216,
> lso_max <0x10000>
> 15252 May 19 11:51:12 poc1 kernel: [ 1178.698449] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 327680,
> lso_max <0x10000>
> 15253 May 19 11:51:12 poc1 kernel: [ 1178.698453] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 262144,
> lso_max <0x10000>
> 15254 May 19 11:51:12 poc1 kernel: [ 1178.698456] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 196608,
> lso_max <0x10000>
> 15255 May 19 11:51:12 poc1 kernel: [ 1178.698460] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 131072,
> lso_max <0x10000>
> 15256 May 19 11:51:12 poc1 kernel: [ 1178.698463] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 65536,
> lso_max <0x10000>
> 15257 May 19 11:51:12 poc1 kernel: [ 1178.698467] ft_queue_data_in:
> Failed to send frame ffff88060ba97400, xid <0xb8a>, remaining 0,
> lso_max <0x10000>
> 

The call into lport->tt.seq_send() libfc code is failing to send
outgoing solicited data-in.  From the output, note the LSO (large
segment offload aka TCP segment offload) feature has been enabled by the
underlying NIC hardware.

So in order to isolate possible issues, I'd recommend:

- Disabling hardware offloads on both initiator and target sides (LRO +
  LSO) using ethtool -K
- Disabling any jumbo frames settings on either side

Is there any other non standard network and/or switch settings that are
in place..?  Also, please confirm what your NIC + switch setup looks
like.

Rob & Open-FCoE folks, is there anything else to take into consideration
here..?

> 
> I didn't see the previous message "unable to handle kernel NULL
> pointer dereference at 0000000000000048". So it must have been fixed
> by your change.
> 

Thanks for confirming that bit.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux