On Wed, 2014-06-04 at 17:22 -0700, Jun Wu wrote: > Is there design limit for the number of target drives that we should > not cross? Is 10 a reasonable number? We did notice that lower number > of target has less problems from our testing. > It completely depends on the fabric, initiator, and backend storage. For example, some initiators (like ESX iSCSI on 1 Gb/sec ethernet) have a problem with more than a handful of LUNs per session, that can result in false positive timeouts on the initiator side under heavy I/O loads due to fairness issues + I/Os not being sent out of the initiator fast enough. Other initiators like Qlogic FC are able to run 256 LUNs in a single session + endpoint without issues. > Are there any additional tests that we can do to narrow down the > problem? For example try different IO types, random vs sequential, > read vs write. Would that help? > If the issue is really related to the number of outstanding I/Os on your network, one easy thing to do is reduce the default queue_depth=3 to queue_depth=1 for each LUN on the initiator side, and see if that has any effect. I don't recall where these values are in /sys for fcoe, but are easy to find using 'find /sys -name queue_depth'. Go ahead and set each of these for your fcoe initiators LUNs to queue_depth=1 + retest. Also note that these values are not persistent across restart. > Nab, > We cannot change the connection between the servers. They are bare > metal cloud servers that we don't have direct access. > That's a shame, as it would certainly help isolate individual networking components. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html