Hi list, The problem in short: in our network we detect tcp retransmits that shouldn't be there in my opion. This happens in the following setup. - Cabinets filled with up to 20 servers connected to a 24 ports switch (all 1Gbit utp links). - These cabinets switches are connected to our routers via 1Gbit utp. - Cabinets don't do over 400Mbit/s on the uplink port (measured over a 10sec interval). - Cabinets contain the following: webservers + php, MySQL , http static content, memcached and some small other architectures. - All servers run linux with 2.6 kernels. - Current switches have the following specs: Switch Fabric Capacity 48.0 Gbps, Forwarding Rate 35.6 Mpps We first noticed packetloss when we introduced memcached multigets. We saw that those pages sometimes rendered +200ms slower than normal, looking deeper into that problem we saw that was caused by tcp retransmits that took 200ms. We than start writing a simple client server application which could reproduce the 200ms timeout in those cabinets (servers in these cabinets where still running in a live environment meanwhile). Debugging this further we noticed that the retransmits never happen below a 150Mbit/s usage of the cabinet uplink. After we enabled flow-control on both the router and the cabinet switch (all ports) things looked a lot better, but didn't solved the problem completely. Then we upgraded the 1Gbit uplink to a 2Gbit trunk as a test which improved the situation even more. We already tested test the following without improvements: - move from utp to fiber - move from cat5e to cat6e - upgrade to switches with more switching capacity and more backplane buffers Because it's madness to upgrade to 2Gbit links when you're only doing 150 to 400Mbit/s of sustained traffic in a cabinet we looked for a way to detect network spikes. We started using libpcap to calculate the bandwidth over a span of 100 received packets. We've build a bridge that could be placed between the uplink and the switch and run the app there. This resulted in spikes up to 1.5Gbit/s which is impossible on a 1Gbit link, this offcourse happend because my libpcap app runs in userspace and cannot see the time spend buffering in the network hardware or kernel. Calculating your Mbit/s over a 10ms with wrong timings can make a very big difference. When we move to a 100.000 packet interval, calculation takes up to 300ms and doesn't shows the sharp spikes anymore. With the bridge in between the uplink and switch we also noticed that when a retransmit occurred the original packet was lost in the switch, it came in via one of the hosts in the cabinetswitch but never reached the uplink to the router. Does anybody knows a reliable way to monitor the bandwidth in very short intervals, like 10ms? Things like iptraf, snmp polling of hardware aren't accurate enough to detect the micro burts we suspect to be there. Or does anybody recognizes the problem and has some tips to prevent the 200ms retransmits? We're not looking to any kernel "patches" like lowering the RTO. Our goal is to prove that we have very short spikes on our network that exceed our 1Gbit link capacity. Regards, Marlon de Boer System engineer http://www.hyves.nl -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html