Re: 200ms retransmits, detecting very short network spikes

Pieter Smit <mlist2010@xxxxxxxxxxx> · Wed, 24 Mar 2010 21:18:03 +0200

Hi,

I assume it is a managed switch if you can setup trunks.

Have a look at the interface counters, it will show if there was a
overflow and dropped packets.
Look for Input or Output queue drops, this would indicate buffers that overflow.

Cheers,
Pieter Smit

On Wed, Mar 24, 2010 at 3:35 PM, Marlon de Boer <marlon@xxxxxxxx> wrote:
> Hi list,
>
> The problem in short: in our network we detect tcp retransmits that
> shouldn't be there in my opion. This happens in the following setup.
>
> - Cabinets filled with up to 20 servers connected to a 24 ports switch
> (all 1Gbit utp links).
> - These cabinets switches are connected to our routers via 1Gbit utp.
> - Cabinets don't do over 400Mbit/s on the uplink port (measured over a
> 10sec interval).
> - Cabinets contain the following: webservers + php, MySQL , http static
> content, memcached and some small other architectures.
> - All servers run linux with 2.6 kernels.
> - Current switches have the following specs: Switch Fabric Capacity 48.0
> Gbps, Forwarding Rate 35.6 Mpps
>
> We first noticed packetloss when we introduced memcached multigets. We
> saw that those pages sometimes rendered +200ms slower than normal,
> looking deeper into that problem we saw that was caused by tcp
> retransmits that took 200ms.
>
> We than start writing a simple client server application which could
> reproduce the 200ms timeout in those cabinets (servers in these cabinets
> where still running in a live environment meanwhile). Debugging this
> further we noticed that the retransmits never happen below a 150Mbit/s
> usage of the cabinet uplink. After we enabled flow-control on both the
> router and the cabinet switch (all ports) things looked a lot better,
> but didn't solved the problem completely. Then we upgraded the 1Gbit
> uplink to a 2Gbit trunk as a test which improved the situation even more.
>
> We already tested test the following without improvements:
>
> - move from utp to fiber
> - move from cat5e to cat6e
> - upgrade to switches with more switching capacity and more backplane
> buffers
>
> Because it's madness to upgrade to 2Gbit links when you're only doing
> 150 to 400Mbit/s of sustained traffic in a cabinet we looked for a way
> to detect network spikes.
>
> We started using libpcap to calculate the bandwidth over a span of 100
> received packets. We've build a bridge that could be placed between the
> uplink and the switch and run the app there. This resulted in spikes up
> to 1.5Gbit/s which is impossible on a 1Gbit link, this offcourse happend
> because my libpcap app runs in userspace and cannot see the time spend
> buffering in the network hardware or kernel. Calculating your Mbit/s
> over a 10ms with wrong timings can make a very big difference. When we
> move to a 100.000 packet interval, calculation takes up to 300ms and
> doesn't shows the sharp spikes anymore.
>
> With the bridge in between the uplink and switch we also noticed that
> when a retransmit occurred the original packet was lost in the switch,
> it came in via one of the hosts in the cabinetswitch but never reached
> the uplink to the router.
>
> Does anybody knows a reliable way to monitor the bandwidth in very short
> intervals, like 10ms? Things like iptraf, snmp polling of hardware
> aren't accurate enough to detect the micro burts we suspect to be there.
> Or does anybody recognizes the problem and has some tips to prevent the
> 200ms retransmits? We're not looking to any kernel "patches" like
> lowering the RTO.
>
> Our goal is to prove that we have very short spikes on our network that
> exceed our 1Gbit link capacity.
>
> Regards,
>
> Marlon de Boer
> System engineer http://www.hyves.nl
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html