Re: network delays, mysterious push packets

Bill Fink <billfink@xxxxxxxxxxxxxx> · Sat, 18 Feb 2006 10:10:12 -0500

On Mon, 06 Feb 2006, David Carlton wrote:

> I'm working on an application that we're trying to switch from a 2.4
> kernel to a 2.6 kernel.  (I believe we're using 2.6.9.)  One part of
> the program periodically sends out chunks of data (whose size is just
> over 1MB) via tcp.
> 
> Frequently, alas, these chunks aren't arriving in a timely fashion.
> Instrumenting the code and doing a tcpdump, this is what we see:
> 
> 1) The sender uses sendmsg() to send all the data.  (In chunks of a
>    little less than 1.5K, in case it matters.)
> 
> 2) Most of the data arrives in a timely fashion.  There are a few
>    dropped packets that have to get retransmitted; no big deal.  (I
>    assume this step overlaps somewhat with step 1; also, sometimes all
>    the data makes it, so we don't progress to step 3.)
> 
> 3) Occasionally, at some point, the transmission slows way down: the
>    sender sends out bits of data (1 or 2 Ethernet frames, I can't
>    remember) spaced 200ms apart, each marked with PUSH.
> 
> I don't understand why they'd be marked with push: by this time, all
> the sendmsg calls have returned, so the sender's kernel should have
> all the data, so there should only be one transmission marked with
> push.  But I'm seeing lots of them.  Which I wouldn't mind so much,
> but the 200ms gaps are killing us.
> 
> Does this ring any bells?  This 200 millisecond gap + PUSH behavior
> seems very odd, so I'm hoping that somebody's seen a misconfiguration
> or kernel bug causing these particular symptoms.
> 
> Thanks for any suggestions that anybody has.
> 
> (I'm not subscribed to the lists, so please Cc: me on any responses.
> Also, my apologies for the crosspost - the linux-net archives were
> relatively bare and spam-filled, so it wasn't clear to me whether or
> not that list was still active.)
> 
> David Carlton
> david.carlton@xxxxxxx

David,

I'm catching up on way backlogged old e-mail, so perhaps (and hopefully)
you have already resolved this (although I didn't see any replies on
the list).

I don't know if it's the same problem, but I have seen performance
on a 2.6.9 kernel with the e1000 driver occasionally go out to lunch
(about 6 times out of 1000).  In my case, it appears to be a problem
with TSO being enabled as disabling TSO made the problem go away.
If the driver you are using supports TSO and has it enabled (check
with "ethtool -k ethX"), then you could try disabling TSO with:

	ethtool -K ethX tso off

						-Hope this helps

						-Bill

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html