Actually that didn't illustrate my point very well, since you see individual requests being sent to the driver without waiting for individual completion, but if you look at the full output you can see that once the queue is full, you're at the mercy of waiting for individual IOs to complete before sending new ones. Sometimes it's one at a time, sometimes you get 3-4 completed and can insert a few at once. I think this is countered by the fact that there's roundtrip network latency in sending the request and in receiving the result. For the record, I'm not saying that it's the entire reason why the performance is lower (obviously since iscsi is better), I'm just saying that when you're talking about high iops, adding 100us (best case gigabit) to each request and response is significant. If an io takes 25us locally (for example an SSD can do 40k iops or more at a queue depth of 1), and you share that storage over gigabit, you just increased the latency by an order of magnitude, and as seen there is only so much simultaneous io going on when the queue depth is raised. Add to that that multipathing isn't doing parallel but interleaving, and extra traffic for distributed storage. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html