Re: dccp bugs

Tomasz Grobelny <tomasz@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 24 Mar 2008 14:07:46 +0100



Dnia Monday 24 of March 2008, Gerrit Renker napisał:
> | One more thing I noticed when using dccp...
> | I have a server which accepts connection, receives data and finishes
> | execution and a client that sends 1000 data packets and finishes
> | execution (see http://dccp.one.pl/svn/userspace/test/). When I run
> | ./server then ./client packets are sent but client program finishes
> | executions only after all packets from queue are sent. Which is I guess
> | quite ok. The problem happens when I kill the server program while client
> | is running and sending packets. The client detects that the connection is
> | broken and starts returning error 32 from sendmsg call.
>
> Error 32 is EPIPEi ("broken pipe"), so this looks correct.
>
> | But after it finishes sending packets it hangs on exit
> | and even kill -9 doesn't work. It finishes after quite a long time (eg.
> | 10 minutes). Am I doing something wrong or is it a bug in dccp? Tested on
> | loopback with rate limiting (sudo tc qdisc add dev lo root handle 1:0 tbf
> | rate 3kbit burst 3kbit latency 500ms). With rate limiting turned off I
> | don't see any problems. Testing between two virtual machines with rate
> | limiting on shows the same problem.
> | --
>
> Can you try the `ss' command from the iproute package when the problem
> occurs, using `ss -nadep' to display the DCCP states?
>
$ ss -nadep
State       Recv-Q Send-Q                        Local Address:Port                          
Peer Address:Port
FIN-WAIT-1  0      0                                 127.0.0.1:2008                             
127.0.0.1:29792  ino:0 sk:d301d3c0

> DCCP is connection-oriented, so killing a server/client is different
> from UDP. When you try to kill a DCCP node, it will first try to finish
> its connection. The `hang' effect is most likely due to an uncompleted
> system call such as close(), and it is in a non-interruptible state.
>
10 minutes for an uninterruptible call seems to be quite a long time. If I 
were a system administrator it would probably drive me mad.

> What is far more important to know - are you using a standard kernel, a
> netdev kernel, or the test tree? And from what you describe, I suspect
> you are using CCID-3 - does the same problem happen with CCID-2?
>
I'm using not that fresh DCCP experimental tree. Tested on CCID-2 but same 
thing happens on CCID-3.

> I am aware that there is at least one patch which may remedy the problem
> you encountered, which is the patch to clean up the write queue on
> (forced) disconnect, also the wait-for-ccid cleanup routine which
> flushes the write queue at the end of the connection.
>
Is it in experimental tree?

> There are also sysctls to reduce the number of attempts to repeat a
> (futile) close at the end, in Documentation/networking/dccp.txt
You mean the *retries* entries? Setting all three to 1 doesn't make it any 
better.

And one more thing: if I try to interrupt the client program before it reaches 
its end all is fine - the program finishes execution immediately.
-- 
Regards,
Tomasz Grobelny
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html