On 6/11/2019 7:21 PM, NeilBrown wrote:
On Tue, Jun 11 2019, Tom Talpey wrote:
I really hope nconnect is not just a workaround for some undiscovered
performance issue. All that does is kick the can down the road.
This is one of my fears too.
My current perspective is to ask
"What do hardware designers optimise for".
because the speeds we are looking at really require various bits of
hardware to be working together harmoniously.
In context, that question becomes "Do they optimise for single
connection throughput, or multiple connection throughput".
I assume you mean NIC hardware designers. The answer is both of
course, but there are distinct advantages in the multiple-connection
case. The main feature is RSS - Receive Side Scaling - which computes
a hash of each 5-tuple-based IP flow and spreads interrupts based on
the value. Generally speaking, that's why multiple connections can
speed up a single NIC, on today's high core count machines.
RDMA has a similar capability, by more explicitly directing its
CQs - Completion Queues - to multiple cores. Of course, RDMA has
further abilities to reduce CPU overhead through direct data placement.
Given the amount of money in web-services, I think multiple connection
throughput is most likely to provide dollars.
I also think that is would be a lot easier to parallelise than single
connection.
Yep, that's another advantage. As you observe, this kind of parallelism
is easier to achieve on the server side. IOW, this helps both ends of
the connection.
So if we NFS developers want to work with the strengths of the hardware,
I think multiple connections and increased parallelism is a sensible
long-term strategy.
So while I cannot rule out any undiscovered performance issue, I don't
think this is just kicking the can down the road.
Agreed. But driving this to one or two dozen connections is different.
Typical NICs have relatively small RSS limits, and even if they have
more, the system's core count and MSI-X vectors (interrupt steering)
rarely approach this kind of limit. If you measure the improvement
vs connection count, you'll find it increases sharply at 2 or 4, then
flattens out. At that point, the complexity takes over and you'll only
see the advantage in a lab. In the real world, a very different picture
emerges, and it can be very un-pretty.
Just some advice, that's all.
Tom.