> This could significantly limit the amount of parallelism that can be
achieved for a single TCP connection (and given that the
> Linux client strongly prefers a single connection now, this could
become more of an issue).
I understand the simplicity in using a single tcp connection, but
performance-wise it is definitely not the way to go on WAN links. When
even a miniscule amount of packet loss is added to the link (<0.001%
packet loss), the tcp buffer collapses and performance drops
significantly (especially on 10GigE WAN links). I think new TCP
algorithms could help the problem somewhat, but nothing available today
makes much of a difference vs. cubic.
Using multiple tcp connections allows better saturation of the link,
since when packet loss occurs on a stream, the other streams can fill
the void. Today, the only solution is to scale up the number of
physical clients, which has high coordination overhead, or use a wan
accelerator such as Bitspeed or Riverbed (which comes with its own
issues such as extra hardware, cost, etc).
It does make a difference on high bandwidth-product networks (something
people have also hit). I'd rather not regress there and also would
rather not require manual tuning for something we should be able to get
right automatically.'
Previous to this patch, the tcp buffer was fixed to such a small size
(especially for writes) that the idea of parallelism was moot anyways.
Whatever the tcp buffer negotiates to now is definitely bigger than was
what there before hand, which I think is brought out by the fact that no
performance regression was found.
Regressing back to the old way is a death nail to any system with a
delay of >1ms or a bandwidth of >1GigE, so I definitely hope we never go
there. Of course, now that autoscaling allows the tcp buffer to grow to
reasonable values to achieve good performance for 10+GigE and WAN links,
if we can improve the parallelism/stability even further, that would be
great.
Dean
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html