Re: panic on 2.6.24rc5

Tomasz Grobelny <tomasz@xxxxxxxxxxxxxxxxxxxxxxx> · Fri, 7 Mar 2008 15:52:10 +0100



Dnia Friday 07 of March 2008, Gerrit Renker napisał:
> | Ok, back to the old thread. I found out that commit after which dccp over
> | loopback (no limiting) has huge delays (as reported previously) is
> | 52515e77a7a69867c479db4c9efb6be832b82179. This is for CCID3 only no
> | matter if client and server programs are run by root or not. dmesg shows:
> | CCID: Registered CCID 3 (TCP-Friendly Rate Control)
> | dccp_sample_rtt: unusable RTT sample -172, using min
> | ccid3_hc_tx_packet_recv: client(cfb52740): ACK with bogus
> | ACK-125773746264929
> |
> | Please feel free to ask for more info. Now I should have more time to
> | test and make experiments.
>
> Thanks. I need to first make a clarification with regard to earlier email:
> the reported error ("err=1 after tx_packet_sent") has nothing to do with
> errno=EPERM. That was my mistake, the `1' is generated by the device output
> routine, which generates NET_XMIT_DROP when the Qdisc decides not to send
> the packet (linux/netdevice.h).
>
And so I guess that this NET_XMIT_DROP should be ignored by dccp code?

> Now to the bugs: the original error message was "crash on loopback", the
> above is a different condition, so one after the other.
>
As I've written previously: I cannot reproduce the crash anymore. Probably due 
to newer kernel version.

> Firstly, irrespective of whether loopback is a representative environment
> or not, if a crash happens on loopback then it must be fixed. Since your
> email I have heard (privately) from at least one person who also
> encountered a crash on loopback. I do occasional tests on loopback, but use
> two or three computers to do the real testing. So far I have been unable to
> reproduce the bug and hence any further information would be great.
>
Hmm... I could provide you with vmware image but it is quite big. About 300M.

> I looked up your previous email on
> http://www.mail-archive.com/dccp@xxxxxxxxxxxxxxx/msg03129.html
> and tried to guess what is happening with regard to "huge delays". If
> the problem you are referring to is the same (i.e. CCID-3 switches to a
> mode of sending 1 packet in 64 seconds), then the above error messages
> are the consequence, but not the cause.
>
> Hence can you please clarify whether CCID-3 is getting into the mode of
> sending once per 64 seconds?
>
Yes, that's the bug I'm refering to.

> Now lastly, ranting about loopback: this link is not very
> representative, it has an extremely small RTT, a large MTU and actually
> is a kind of virtual interface. Hence it is possible to run into
> problems due to the nature of the link, but not the CCID-3 module.
>
But still it is a very convenient testbed for new applications. So it would be 
nice if it worked so as not to scare off new developers ;-)

> In particular, the following problems are likely:
>  * due to the small RTT, the nofeedback timer triggers very often,
>    quickly reducing the sending rate towards 1 in 64 seconds
>  * this is because the nofeedback timer is triggered every 4*RTT,
>    a loopback RTT is < 50 usec, so that would be ~200 usec
>  * to avoid this, there is a kernel configuration option of CCID-3
>    to set an upper bound for this.
How do I set it?

>
> Please let us know if that diagnosis matches the case.
>
How can I test it? The best I came up with was adding delay to the interface 
(tc qdisc add dev lo root netem delay 1ms) and test again. And it works ok 
now, no slowing down.
So probably it is a matter of setting sensible default in the kernel.
-- 
Regards,
Tomasz Grobelny
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html