Re: [Open-FCoE] System crashes with increased drive count

Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> · Thu, 19 Jun 2014 10:00:37 -0700

On Thu, 2014-06-12 at 18:20 -0700, Jun Wu wrote:
> On Thu, Jun 12, 2014 at 5:43 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> > On Thu, 2014-06-12 at 15:18 -0700, Jun Wu wrote:
> >> On Wed, Jun 11, 2014 at 11:19 AM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> >> > On Tue, 2014-06-10 at 19:40 -0700, Jun Wu wrote:
> >> >> On Tue, Jun 10, 2014 at 3:38 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> >> >> > On Tue, 2014-06-10 at 09:46 -0700, Jun Wu wrote:
> >> >> >> This a Supermicro chassis with redundant power supplies. We see the
> >> >> >> same failures with both SSDs or HDDs.
> >> >> >> The same tests pass with non-fcoe protocol, i.e. iSCSI or AoE.
> >> >> >>
> >> >> >
> >> >> > Is iSCSI or AoE tests with same TCM core kernel with same target and
> >> >> > host NICs/switch ?
> >> >>
> >> >> We tested AoE with the same hardware/switch and test setup. AoE works
> >> >> except that it is not enterprise protocol and it doesn't provide
> >> >> performance. It doesn't use TCM.
> >> >>
> >> >
> >> > You had fcoe working with lower queue depth and that should be yielding
> >> > lower performance as AoE beside AoE is not using TCM, so not correct
> >> > comparison. What about iSCSI, is that using TCM ?
> >>
> >> We didn't use TCM for the iSCSI test either.
> >>
> >
> > Too many variables to compare or get any help in isolating issues here.
> >
> >> >> >
> >> >> > What NICs in your chassis? As I mentioned before that "DCB and PFC PAUSE
> >> >> > typically used and required by fcoe", but you are using PAUSE and switch
> >> >> > cannot be eliminated as you mentioned before, these could affect more to
> >> >> > FCoE than other protocols, so can you ensure IO errors are not due to
> >> >> > frames losses w/o DCB/PFC in your setup ?
> >> >>
> >> >> The NIC is:
> >> >> [root@poc1 log]# lspci | grep 82599
> >> >> 08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >> >> SFI/SFP+ Network Connection (rev 01)
> >> >>
> >> >> The issue should not be caused by frame losses. The systems work fine
> >> >> with other protocols.
> >> >
> >> > FCoE is less tolerant than others to packet losses & latency variations
> >> > for more FC like deterministic fabric performance and therefore no drop
> >> > ethernet is must for FCoE unlike others in the comparison, for instance
> >> > iSCSI would adapt tx window as per frame losses but no such thing in
> >> > FCoE.  Thus you cannot conclude that there is no frames losses just
> >> > because other works in the setup, iSCSI and AoE should work fine without
> >> > no drop ethernet, PAUSE or PFC, so can you confirm no frames losses
> >> > using "ethtool -S ethX" ?
> >> >
> >>
> >> I ran the same test again, that is 10 fio sessions from one initiator
> >> to 10 drives on the target via vn2vn. After I saw the following
> >> messages
> >>   poc2 kernel: ft_queue_data_in: Failed to send frame
> >> ffff8800ae64ba00, xid <0x389>, remaining 196608, lso_max <0x10000>
> >>
> >> I checked
> >>   ethtool -S p4p1 | grep error
> >> and
> >>   ethtool -S p4p1 | grep control
> >> on both initiator and target, the numbers were all zero.
> >>
> >> So the abort should not be caused by frame losses.
> >
> > You mentioned below frames missed count up, thus there are packet drops.
> > It would be either due to loss or response not in time. So it could be
> > due to frames loss,  BTW is it after increased timeout as suggested
> > above with REC disabled ?
> 
> In this particular test, I saw multiple "Failed to send frame"
> messages first, and after quite a while, there were no frame losses
> reported. It seems to me that the frame loss was not the cause of the
> abort, at least in this case.
> Yes, the kernel used for the test has REC disabled.
> 

In that case it takes 10 seconds before abort issued, so don't know what
would hold IO that long in your test setup w/o any frames loss. You
might want to trace IO end to end or profile system for any bottlenecks.
As for FCoE Host stack goes its optimized significantly as I push over
2M IOPS with just single dual port 82599ES 10-Gigabit adapter, so
possibly bottlenecks beyond host interface anywhere from switch to the
backend and therefore profiling or eliminating some setup elements could
help you locate bottlenecks.

Again repeating as I mentioned before that switch eliminate in your
setup is without DCB/PFC so either find ways to skip that hop or find
more into switch for possible drops or extended PAUSE etc. If you could
tell about switch in use then I might try to find more on that though
that all goes beyond Open-FCoE stack.

Thanks,
Vasu

//Vasu

> >
> >>
> >> >> >
> >> >> > While possibly abort issues at target with zero timeout values but you
> >> >> > could avoid them completely by increasing scsi timeout and disabling REC
> >> >> > as discussed before.
> >> >> >
> >> >
> >> > Now that I know ixgbe (82599) in use, try few more things in addition to
> >> > suggestions above:->> >
> >> > 1) Disable irq balancer
> >> > 2) Find your ethX interrupts through "cat /proc/interrupts | grep ethX".
> >> > Identifying fcoe among them is tricky, it may be labeled with fcoe but
> >> > if not then identify them through intr activity while fcoe traffic on,
> >> > total 8 fcoe intr are used and pin them across first eight set of CPUs
> >> > used in your workloads.
> >> > 3) Increase rings size from default 512 to 2K or 4K, just a hunch in
> >> > case frames dropped due to longer PAUSE or congestion in your setup.
> >> > 4) Also monitor ethX stats beside fcoe hostX stats for anything stand
> >> > out odd there at "/sys/class/fc_host/hostX/statistics/"
> >> >
> >> >
> >>
> >> We disabled irq balancer, found the interrupts and pinned them.
> >> Increased the ring sizes to 4K. It seems that these changes allow the
> >> test to run longer. But the target eventually hung. In this test, we
> >> saw none zero tx_flow_control_xon/off and rx_flow_control_xon/off
> >> numbers which indicate PAUSE is working. We did see frame losses
> >> (rx_missed_errors) in this test. So PAUSE frames were issued,
> >
> > NIC sent pause frames out but still switch not stopping means possibly
> > switch side pause not enabled and leading to rx_missed_errors, again
> > back to back would have helped here as Nab suggested before.
> >
> >> but
> >> ultimately it didn't work.
> >>
> >> Is it reasonable to expect PAUSE frame to be sufficient for end to end
> >> flow control between nodes?
> >
> > PAUSE/PFC is link level but would spread in case of multi nodes but
> > would depend on switch is use, so check with them.
> >
> >> If not, should the issue be dealt with at
> >> mid level by, for example, issuing BUSY to manage flow control between
> >> nodes? Or there needs to be some management at higher level by
> >> limiting outstanding commands? What is the best way to manage?
> >>
> >
> > Stack should handle this provided PAUSE is working and frames are not
> > dropped at L2.
> >
> >> We are flooding the network link, using all SSD drives and pushing the
> >> boundaries:)
> >>
> >
> > Yeap, I doubt anyone has done enough stress on TCM FC so far and mostly
> > hosts SW are tested least at Intel against real FC/FCoE targets.
> >
> > //Vasu
> >
> >> Thanks,
> >> Jun
> >>
> >> > <snip>
> >> >>
> >> >> Is the following cmd_per_lun fcoe related? Its default value is 3. And
> >> >> it doesn't allow me to change.
> >> >> /sys/devices/pci0000:00/0000:00:05.0/0000:08:00.0/net/p4p1/ctlr_2/host9/scsi_host/host9/cmd_per_lun
> >> >
> >> > I think this doesn't matter once device queue depth adjusted to 32 and
> >> > that can be adjusted. I mean this is used at scsi host alloc as initial
> >> > queue depth and later scsi device queue depth is adjusted to 32 through
> >> > slave_alloc call back and that can be adjusted
> >> > at /sys/block/sdX/device/queue_depth as you did before but not this.
> >> >
> >> > //Vasu
> >> >
> >> >
> >
> >

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html