Re: [Open-FCoE] System crashes with increased drive count

Jun Wu <jwu@xxxxxxxxxxxx> · Thu, 12 Jun 2014 18:20:43 -0700

On Thu, Jun 12, 2014 at 5:43 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> On Thu, 2014-06-12 at 15:18 -0700, Jun Wu wrote:
>> On Wed, Jun 11, 2014 at 11:19 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
>> > On Tue, 2014-06-10 at 19:40 -0700, Jun Wu wrote:
>> >> On Tue, Jun 10, 2014 at 3:38 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
>> >> > On Tue, 2014-06-10 at 09:46 -0700, Jun Wu wrote:
>> >> >> This a Supermicro chassis with redundant power supplies. We see the
>> >> >> same failures with both SSDs or HDDs.
>> >> >> The same tests pass with non-fcoe protocol, i.e. iSCSI or AoE.
>> >> >>
>> >> >
>> >> > Is iSCSI or AoE tests with same TCM core kernel with same target and
>> >> > host NICs/switch ?
>> >>
>> >> We tested AoE with the same hardware/switch and test setup. AoE works
>> >> except that it is not enterprise protocol and it doesn't provide
>> >> performance. It doesn't use TCM.
>> >>
>> >
>> > You had fcoe working with lower queue depth and that should be yielding
>> > lower performance as AoE beside AoE is not using TCM, so not correct
>> > comparison. What about iSCSI, is that using TCM ?
>>
>> We didn't use TCM for the iSCSI test either.
>>
>
> Too many variables to compare or get any help in isolating issues here.
>
>> >> >
>> >> > What NICs in your chassis? As I mentioned before that "DCB and PFC PAUSE
>> >> > typically used and required by fcoe", but you are using PAUSE and switch
>> >> > cannot be eliminated as you mentioned before, these could affect more to
>> >> > FCoE than other protocols, so can you ensure IO errors are not due to
>> >> > frames losses w/o DCB/PFC in your setup ?
>> >>
>> >> The NIC is:
>> >> [root@poc1 log]# lspci | grep 82599
>> >> 08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> >> SFI/SFP+ Network Connection (rev 01)
>> >>
>> >> The issue should not be caused by frame losses. The systems work fine
>> >> with other protocols.
>> >
>> > FCoE is less tolerant than others to packet losses & latency variations
>> > for more FC like deterministic fabric performance and therefore no drop
>> > ethernet is must for FCoE unlike others in the comparison, for instance
>> > iSCSI would adapt tx window as per frame losses but no such thing in
>> > FCoE.  Thus you cannot conclude that there is no frames losses just
>> > because other works in the setup, iSCSI and AoE should work fine without
>> > no drop ethernet, PAUSE or PFC, so can you confirm no frames losses
>> > using "ethtool -S ethX" ?
>> >
>>
>> I ran the same test again, that is 10 fio sessions from one initiator
>> to 10 drives on the target via vn2vn. After I saw the following
>> messages
>>   poc2 kernel: ft_queue_data_in: Failed to send frame
>> ffff8800ae64ba00, xid <0x389>, remaining 196608, lso_max <0x10000>
>>
>> I checked
>>   ethtool -S p4p1 | grep error
>> and
>>   ethtool -S p4p1 | grep control
>> on both initiator and target, the numbers were all zero.
>>
>> So the abort should not be caused by frame losses.
>
> You mentioned below frames missed count up, thus there are packet drops.
> It would be either due to loss or response not in time. So it could be
> due to frames loss,  BTW is it after increased timeout as suggested
> above with REC disabled ?

In this particular test, I saw multiple "Failed to send frame"
messages first, and after quite a while, there were no frame losses
reported. It seems to me that the frame loss was not the cause of the
abort, at least in this case.
Yes, the kernel used for the test has REC disabled.

>
>>
>> >> >
>> >> > While possibly abort issues at target with zero timeout values but you
>> >> > could avoid them completely by increasing scsi timeout and disabling REC
>> >> > as discussed before.
>> >> >
>> >
>> > Now that I know ixgbe (82599) in use, try few more things in addition to
>> > suggestions above:->> >
>> > 1) Disable irq balancer
>> > 2) Find your ethX interrupts through "cat /proc/interrupts | grep ethX".
>> > Identifying fcoe among them is tricky, it may be labeled with fcoe but
>> > if not then identify them through intr activity while fcoe traffic on,
>> > total 8 fcoe intr are used and pin them across first eight set of CPUs
>> > used in your workloads.
>> > 3) Increase rings size from default 512 to 2K or 4K, just a hunch in
>> > case frames dropped due to longer PAUSE or congestion in your setup.
>> > 4) Also monitor ethX stats beside fcoe hostX stats for anything stand
>> > out odd there at "/sys/class/fc_host/hostX/statistics/"
>> >
>> >
>>
>> We disabled irq balancer, found the interrupts and pinned them.
>> Increased the ring sizes to 4K. It seems that these changes allow the
>> test to run longer. But the target eventually hung. In this test, we
>> saw none zero tx_flow_control_xon/off and rx_flow_control_xon/off
>> numbers which indicate PAUSE is working. We did see frame losses
>> (rx_missed_errors) in this test. So PAUSE frames were issued,
>
> NIC sent pause frames out but still switch not stopping means possibly
> switch side pause not enabled and leading to rx_missed_errors, again
> back to back would have helped here as Nab suggested before.
>
>> but
>> ultimately it didn't work.
>>
>> Is it reasonable to expect PAUSE frame to be sufficient for end to end
>> flow control between nodes?
>
> PAUSE/PFC is link level but would spread in case of multi nodes but
> would depend on switch is use, so check with them.
>
>> If not, should the issue be dealt with at
>> mid level by, for example, issuing BUSY to manage flow control between
>> nodes? Or there needs to be some management at higher level by
>> limiting outstanding commands? What is the best way to manage?
>>
>
> Stack should handle this provided PAUSE is working and frames are not
> dropped at L2.
>
>> We are flooding the network link, using all SSD drives and pushing the
>> boundaries:)
>>
>
> Yeap, I doubt anyone has done enough stress on TCM FC so far and mostly
> hosts SW are tested least at Intel against real FC/FCoE targets.
>
> //Vasu
>
>> Thanks,
>> Jun
>>
>> > <snip>
>> >>
>> >> Is the following cmd_per_lun fcoe related? Its default value is 3. And
>> >> it doesn't allow me to change.
>> >> /sys/devices/pci0000:00/0000:00:05.0/0000:08:00.0/net/p4p1/ctlr_2/host9/scsi_host/host9/cmd_per_lun
>> >
>> > I think this doesn't matter once device queue depth adjusted to 32 and
>> > that can be adjusted. I mean this is used at scsi host alloc as initial
>> > queue depth and later scsi device queue depth is adjusted to 32 through
>> > slave_alloc call back and that can be adjusted
>> > at /sys/block/sdX/device/queue_depth as you did before but not this.
>> >
>> > //Vasu
>> >
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html