Re: [Open-FCoE] System crashes with increased drive count

Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> · Fri, 06 Jun 2014 13:28:29 -0700

On Thu, 2014-06-05 at 15:43 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2014-06-04 at 17:22 -0700, Jun Wu wrote:
> > Is there design limit for the number of target drives that we should
> > not cross? Is 10 a reasonable number? We did notice that lower number
> > of target has less problems from our testing.
> > 
> 
> It completely depends on the fabric, initiator, and backend storage.
> 
> For example, some initiators (like ESX iSCSI on 1 Gb/sec ethernet) have
> a problem with more than a handful of LUNs per session, that can result
> in false positive timeouts on the initiator side under heavy I/O loads
> due to fairness issues + I/Os not being sent out of the initiator fast
> enough.
> 
> Other initiators like Qlogic FC are able to run 256 LUNs in a single
> session + endpoint without issues.
> 
> > Are there any additional tests that we can do to narrow down the
> > problem? For example try different IO types, random vs sequential,
> > read vs write. Would that help?
> > 
> 
> If the issue is really related to the number of outstanding I/Os on your
> network, one easy thing to do is reduce the default queue_depth=3 to
> queue_depth=1 for each LUN on the initiator side, and see if that has
> any effect.
> 

Good idea and possibly that is the cause and not something else such as
switch is the factor w/o DCB and PFC PAUSE typically used and required
by fcoe. 

> I don't recall where these values are in /sys for fcoe, but are easy to

It is at /sys/block/sdX/device/queue_depth, it is transport agnostic and
therefore same location for fcoe also like any other disks are.

> find using 'find /sys -name queue_depth'.  Go ahead and set each of
> these for your fcoe initiators LUNs to queue_depth=1 + retest.
> 

> Also note that these values are not persistent across restart.
> 
> > Nab,
> > We cannot change the connection between the servers. They are bare
> > metal cloud servers that we don't have direct access.
> > 
> 
> That's a shame, as it would certainly help isolate individual networking
> components.
> 

Yeah, that would have helped.

//Vasu

> --nab
> 

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html