dccp bugs (Was: Re: panic on 2.6.24rc5)

Tomasz Grobelny <tomasz@xxxxxxxxxxxxxxxxxxxxxxx> · Sat, 8 Mar 2008 16:29:08 +0100



Dnia Friday 07 of March 2008, Gerrit Renker napisał:
> | > errno=EPERM. That was my mistake, the `1' is generated by the device
> | > output routine, which generates NET_XMIT_DROP when the Qdisc decides
> | > not to send the packet (linux/netdevice.h).
> |
> | And so I guess that this NET_XMIT_DROP should be ignored by dccp code?
>
> In the test tree it is now (almost) ignored. Since the time you reported
> this problem I have changed the DCCP_BUG to a DCCP_WARN, so that the
> drop will still be logged, but there will be fewer such warnings in the
> log now (in DCCP_WARN, printk is rate-limited).
>
I'm not sure it should print the warning. Ok, it's nice to know that the 
packet was dropped but:
a) in real world there would be no such information,
b) it's not something unusual for DCCP to lose packet without warning.
Maybe it should only warn if debugging option is enabled?

> This is interesting -- so you are running DCCP under virtualisation?
Yes.

> Arnaldo used to do this with QEMU, I tried this recently also but am no
> fan of virtual networks. Yes, if the crash persisted, any information
> would help.
>
The panic happens no more.

> | >  * to avoid this, there is a kernel configuration option of CCID-3
> | >    to set an upper bound for this.
> |
> | How do I set it?
>
> In the menu under
>  Networking -> Network Options -> The DCCP Protocol (EXPERIMENTAL)
>  -> DCCP CCIDs Configuration (EXPERIMENTAL) -> CCID3
>  -> (100) Use higher bound for nofeedback timer
> Ah - just remembered -- the default is 100 milliseconds, so this will
> probably have caught the problems with the low RTT.
>
100ms? Then it doesn't work as expected because adding just 1ms delay with 
netem fixed the problem.
If you meant 100us then it still doesn't work. Changing the parameter to 1000 
delays the bug - I have to send more packets for it to happen.

> | > Please let us know if that diagnosis matches the case.
> |
> | How can I test it? The best I came up with was adding delay to the
> | interface (tc qdisc add dev lo root netem delay 1ms) and test again. And
> | it works ok now, no slowing down.
>
> So from what you wrote I read
>  * without additional delay the described problem occurs and CCID-3
>    gets into 1-packet-per-64-seconds mode
>  * when you add 1millisecond delay to the interface then it works ok.
>
That's exactly what I meant. After adding this 1ms delay I was not able to 
reproduce the bug. This does not mean it would not happen after long enough 
testing.

> You can plot the CCID-3 RTTs using dccp_probe, scripts are on
> http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/
>
> (at the bottom of the page).
See http://student.uci.agh.edu.pl/~grobelny/linux/outfile for raw dccp_probe 
data.
-- 
Regards,
Tomasz Grobelny
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html