On Mon, 06 Feb 2006, David Carlton wrote: > I'm working on an application that we're trying to switch from a 2.4 > kernel to a 2.6 kernel. (I believe we're using 2.6.9.) One part of > the program periodically sends out chunks of data (whose size is just > over 1MB) via tcp. > > Frequently, alas, these chunks aren't arriving in a timely fashion. > Instrumenting the code and doing a tcpdump, this is what we see: > > 1) The sender uses sendmsg() to send all the data. (In chunks of a > little less than 1.5K, in case it matters.) > > 2) Most of the data arrives in a timely fashion. There are a few > dropped packets that have to get retransmitted; no big deal. (I > assume this step overlaps somewhat with step 1; also, sometimes all > the data makes it, so we don't progress to step 3.) > > 3) Occasionally, at some point, the transmission slows way down: the > sender sends out bits of data (1 or 2 Ethernet frames, I can't > remember) spaced 200ms apart, each marked with PUSH. > > I don't understand why they'd be marked with push: by this time, all > the sendmsg calls have returned, so the sender's kernel should have > all the data, so there should only be one transmission marked with > push. But I'm seeing lots of them. Which I wouldn't mind so much, > but the 200ms gaps are killing us. > > Does this ring any bells? This 200 millisecond gap + PUSH behavior > seems very odd, so I'm hoping that somebody's seen a misconfiguration > or kernel bug causing these particular symptoms. > > Thanks for any suggestions that anybody has. > > (I'm not subscribed to the lists, so please Cc: me on any responses. > Also, my apologies for the crosspost - the linux-net archives were > relatively bare and spam-filled, so it wasn't clear to me whether or > not that list was still active.) > > David Carlton > david.carlton@xxxxxxx David, I'm catching up on way backlogged old e-mail, so perhaps (and hopefully) you have already resolved this (although I didn't see any replies on the list). I don't know if it's the same problem, but I have seen performance on a 2.6.9 kernel with the e1000 driver occasionally go out to lunch (about 6 times out of 1000). In my case, it appears to be a problem with TSO being enabled as disabling TSO made the problem go away. If the driver you are using supports TSO and has it enabled (check with "ethtool -k ethX"), then you could try disabling TSO with: ethtool -K ethX tso off -Hope this helps -Bill - : send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html