Re: panic on 2.6.24rc5

Gerrit Renker <gerrit@xxxxxxxxxxxxxx> · Fri, 7 Mar 2008 16:09:48 +0000

| > errno=EPERM. That was my mistake, the `1' is generated by the device output
| > routine, which generates NET_XMIT_DROP when the Qdisc decides not to send
| > the packet (linux/netdevice.h).
| >
| And so I guess that this NET_XMIT_DROP should be ignored by dccp code?
| 
In the test tree it is now (almost) ignored. Since the time you reported
this problem I have changed the DCCP_BUG to a DCCP_WARN, so that the
drop will still be logged, but there will be fewer such warnings in the
log now (in DCCP_WARN, printk is rate-limited).

| > Firstly, irrespective of whether loopback is a representative environment
| > or not, if a crash happens on loopback then it must be fixed. Since your
| > email I have heard (privately) from at least one person who also
| > encountered a crash on loopback. I do occasional tests on loopback, but use
| > two or three computers to do the real testing. So far I have been unable to
| > reproduce the bug and hence any further information would be great.
| >
| Hmm... I could provide you with vmware image but it is quite big. About 300M.
| 
This is interesting -- so you are running DCCP under virtualisation?
Arnaldo used to do this with QEMU, I tried this recently also but am no
fan of virtual networks. Yes, if the crash persisted, any information
would help.

| > Now lastly, ranting about loopback: this link is not very
| > representative, it has an extremely small RTT, a large MTU and actually
| > is a kind of virtual interface. Hence it is possible to run into
| > problems due to the nature of the link, but not the CCID-3 module.
| >
| But still it is a very convenient testbed for new applications. So it would be 
| nice if it worked so as not to scare off new developers ;-)
| 
Yes you are right - I agree.

| >  * to avoid this, there is a kernel configuration option of CCID-3
| >    to set an upper bound for this.
| How do I set it?
| 
In the menu under
 Networking -> Network Options -> The DCCP Protocol (EXPERIMENTAL) 
 -> DCCP CCIDs Configuration (EXPERIMENTAL) -> CCID3 
 -> (100) Use higher bound for nofeedback timer
Ah - just remembered -- the default is 100 milliseconds, so this will
probably have caught the problems with the low RTT.

| >
| > Please let us know if that diagnosis matches the case.
| >
| How can I test it? The best I came up with was adding delay to the interface 
| (tc qdisc add dev lo root netem delay 1ms) and test again. And it works ok 
| now, no slowing down.
So from what you wrote I read
 * without additional delay the described problem occurs and CCID-3
   gets into 1-packet-per-64-seconds mode
 * when you add 1millisecond delay to the interface then it works ok.

Then this means that the problems are indeed due to an extremely low
RTT. If this is the case, the problem is resolved.

| So probably it is a matter of setting sensible default in the kernel.
There are arguments which say that on LANs with such low RTTs
congestion control should be skipped as this generates "noise".

If above reasoning holds, this should be marked as a ToDo item, e.g. by
using a higher threshold for the RTTs (the current minimum is 100 usec).

You can plot the CCID-3 RTTs using dccp_probe, scripts are on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/

(at the bottom of the page).
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html