Re: sendbuffer-size controls (non)blocking behaviour? ccid3 throughput correct?

Gerrit Renker <gerrit@xxxxxxxxxxxxxx> · Fri, 2 Oct 2009 07:53:05 +0200

| I'm doing some experiments over DCCP (Ubuntu kernel version 2.6.28-15) using CCID3. The following is
| a list of things which confused me a bit. Maybe someone can give me an explanation...
| All mentioned files in the following text can be found at http://138.232.66.193/public/.

You are using a generic ubuntu (jaunty) kernel?  As far as I know this is from the stable mainline branch.
For all serious DCCP testing, please consider using the test tree

   http://www.linuxfoundation.org/en/Net:DCCP_Testing#Experimental_DCCP_source_tree

The test tree is ahead of the mainline kernel and contains more up-to-date fixes. Even though the
name is tagged 'experimental', the 'dccp' branch is checked to build cleanly and does not actually
contain experimental patches; these are deferred to subtrees. It is quite possible that some of the
described problems will disappear when using the test tree.

| In all scenarios, I have a sender(A) and a receiver(C) application. Both half-connections use CCID3.
| The sender transmits at full speed, the other half-connection isn't used. (shutdown(socket,SHUT_RD)
| is called at the sender). Between A and C, I have another computer (B) and i applied tc qdisc add
| dev ethx root tbf rate 40kbit burst 10kb limit 10kb
| 
| 1) I usually abort the sender with Ctrl+C. The sender sends a Close, the receiver immediately
| answers with CloseReq. Then the sender agains sends a Close and repeats this after 6 seconds and
| again after another 12 seconds. Then again the receiver sends a CloseReq and the sender returns
| Close (and so on). And no, I haven't forgotten the receiver-side close(socket) call.
|
With regard to RFC 4340, the receiver doing the passive-open is the 'server'. When you kill the
userspace application via CTRL-C, the sender performs an active close and enters the CLOSING state.
Within this state, it will continue to retransmit Close packets until it receives a DCCP-Reset
packet. 

The receiver would normally reply to a Close with a Reset. The receiver-side close(socket) call performs
 an active close at the server side. Hence if I understand the situation correctly, what you are describing
is a case of "simultaneous active close", i.e. sender and receiver perform an active close nearly
simultaneously. There is no special provision for this condition in the RFC, but the implementation is
equipped to handle it; described in 4.2 on
   http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/closing_states/
The tie-breaker in this case is that the retransmitted Close packet triggers a DCCP-Reset with a 
type of "No Connection", which then causes the state transition from CLOSING to TIMEWAIT.

However, in your wireshark capture there are also Resets of type "Aborted"i (all receiver port numbers
less than 5008), which is the way the TCP ABORT function is implemented -- when the receiver (sender)
is being disconnected. In capture with port number 5008, the Reset(Aborted) happens before the
Reset(No Connection). It seems that in your application the receiver/sender calls close() before
the sender is killed via CTRL-C, which would explain why the CloseReq appears before the Close.

In the connection using port number 5010 there are no CloseReqs (or any other type of packet back from
192.168.3.2), hence the the retransmit times out with a Reset eventually, i.e. it does not retransmit
the Close ad infinitum if it does not get any response from the peer.

| The receiver processed incoming connections in a while loop (one bind and listen call at the
| beginning of the program, several accept and recv calls in the loop). From time to time, it happens
| that I cannot establish a connection to the same port again and get the error "Too many users". The
| receiver answer with a Reset packet, code "too busy". After several minutes, the port can be reused
| again. after_application_end.* is a packet dump performed at B after doing some tests on various ports.

The EUSERS error is the translation of the 'too busy' DCCP_RESET_CODE_TOO_BUSY reset code. There are
several possible causes:

 a) The size of the accept() queue set via the second parameter of listen(2).
    This seems likely: in this case the DCCP-Request is handled by dccp_v{4,6}_conn_request, which
    returns -1, causing dccp_rcv_state_process to return 1, which then causes dccp_v{4,6}_do_rcv
    to send a reset with the previously-prepared reset code.
    Could you test with different sizes of the 'backlog' argument to listen(2)?

 b) The request-accept queue which containing the half-finished connection requests. This is 
    related to (a) since the queue size is also set via the 'backlog' argument to listen(). If
    changing the 'backlog' in (a) does not change the setting, the problem might be in setting
    nr_table_entries to a maximum of 16 in reqsk_queue_alloc(), which is the case when using a
    value of 8 or greater for the 'backlog' argument.
    The nr_table_entries is also influenced by tcp_max_syn_backlog, which however is much
    larger (128 or 1024).

 c) Other causes would be rarer conditions such as running out of memory.

| 2) I send data packets with payload size 1000 bytes. When I choose a send buffer size <= 4976 bytes,
| the send call is blocking as expected (setsockopt(socket, SOL_SOCKET, SO_SNDBUF, ...). By increasing
| the send buffer by at least 1 byte, the socket is non-blocking. It returns EAGAIN until we are
| allowed to send a new packet.

The EAGAIN results from the way CCID-3 currently dequeues packets, which is independent of setting the
socket blocking/non-blocking. Unlike UDP, packets are not immediately dequeued after calling send/write,
but rather depending on the current allowed sending rate.
The default queue length in packets is /proc/sys/net/dccp/default/tx_qlen = 5. You can increase this 
value or set it to 0 to disable the length check. This is the default mainline policy; in the test tree
we have the qpolicy framework by Tomasz Grobelny, where the mainline dequeueing policy has been renamed
into the 'simple' qpolicy.

| 3) Can I control the blocking/nonblocking behavior somehow? (e.g. using ioctl FIONBIO or O_NONBLOCK)
Yes, as per (2). In CCID-2 the EAGAIN is very rarely possible, only if the network is severely congested
or overloaded, so it may be better to start testing with this CCID-2 if you do want to use non-blocking.

| 4) I also observed some strange behaviour here: I use tc qdisc add dev ethx root netem delay 50ms.
| 50ms_noloss.jpg depicts the throughput. Why are there these periodic drops? There isn't any packet loss.
| 
It is difficult to say what exactly happened given just one figure. To verify that there is indeed no
packet loss, it would be useful to have the dccp_probe data. This is much preferable to the socket option
in (5) as it shows the internals directly. Even if it seems counter-intuitive, it is possible to cause
packet loss with a Token Bucket Filter, for instance if the receiver queue size is not large enough.

Some notes for dccp_probes are on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/

| 5) I modified the scenario from point 4 and caused a single packet loss ~ at second 8,5 (50ms_singleloss.jpg).
| By using getsockopt with DCCP_SOCKOPT_CCID_TX_INFO, I see that p (packet loss rate) gets a nonzero value, which
| then decreases down to 0.01% but not further. Unfortunately, the connection can only reach a
| 1/5 of the throughput before the packet drop. I know that the theoretic bandwidth utilization
| depends on the bandwidth delay product, but is a rtt of 50ms such a dramatically high value??

This is governed by the formula for X_Bps in section 3.1 of RFC 5348; since the RTT is in the denominator, the
allowed sending rate is inversely proportional to the RTT (i.e. 10-times higher RTT means 10 times lower X_Bps).

--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html