Re: Reporting possible bug in DCCP Feature negotiation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 7, 2008 at 10:15 AM, Gerrit Renker <gerrit@xxxxxxxxxxxxxx> wrote:
> | Hi Gerrit, during some experiments using iperf to transmit one TCP
>  | flow against 3 DCCP flows, one DCCP thread stopped and it was shown
>  | the following message:
>  |
>  | "[ 6286.864021] BUG: Failed negotiation in state 2 at
>  | net/dccp/feat.c:1370/dccp_feat_activate_values()"
>  |
>  Did it really stop? The trigger is only possible during the initial
>  handshake, this is the only place where this activation-code is
>  executed. So almost certainly it failed at the begin of the connection.

I started 3 DCCP connection, just 2 transmitted. Then the third is the
one that had the problem.

>
>  I am assuming that both end hosts have identical kernels, if they are
>  using different versions of DCCP, this would be a likely cause.

2.6.25-rc3 in both sides

>
>  The log is very interesting, but I can't say whether the error message
>  is really a bug or if the action was correct under the given
>  circumstances; below is explanation of the messages and further things
>  to check.

Ok

>
>  Firstly, regarding the `BUG' message: it is part of the initial feature
>  negotiation (during 3-way handshake). The "state 2" means FEAT_CHANGING,
>  a  Change option was sent but not confirmed.
>  The good thing of having a stack trace is that we know that this
>  happened on the server,

Yes, in fact.

> in dccp_create_openreq_child. It would be next
>  to ideal to have a tcpdump of the scenario, but on paper this looks like:
>   * client is in PARTOPEN (has sent Ack, awaits packet from server);
>   * server is in RESPOND and about to move to OPEN;
>   * client has sent the Ack confirming the options requested by the
>    server;
>   * the server complains that one of the options it has asked for was
>    not confirmed.
>  Can you confirm that both end hosts really had identical DCCP versions?
>

Yes, the same version of DCCP. I repeated the same scenario and this
situation didn't happened anymore

>  In the log I see that there were several large burst losses plus one
>  occurrence of `invalid' checksum. Is this a very lossy link - if the
>  DCCP-Request/Response/Ack got retransmitted, the feature-negotiation
>  options would in theory also be retransmitted, so that would not matter.
>

I'm using one end connected to a wifi 802.11g access point and the
other connected via ethernet in this same access point. The ethernet
point is the server, the wifi point, the client.

>  Individual information to the messages is below.
>
>
>  | [ 3128.785764] dccp_hdlr_ack_ratio: Not changing RX Ack Ratio from 1 to 2
>  <snip>
>  The "Not changing Ack Ratio" message are harmless, they are just a
>  reminder that, due to still unresolved problems in CCID-2, dynamic
>  switch of Ack Ratio is currently disabled: Ack Ratio is currently always
>  1, no matter the messages (net/dccp/minisocks.c).
>
>
>  | [ 4510.339903] dccp_v4_rcv: dropped packet with invalid checksum
>  Checksum failure is extremely rare -- are you using a wireless link?
>

Yes, wireless link.

>
>
>  | [ 4852.431819] ccid2_change_l_ack_ratio: Limiting Ack Ratio (4) to 2
>  This message can also be ignored due to the earlier comment; when using
>  dynamic Ack Ratios this would avoid persistent spurious RTO timeouts.
>
>
>
>
>  | [ 5467.093407] e1000: eth0: e1000_watchdog: NIC Link is Down
>  | [ 5470.228254] e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps
>  | Full Duplex, Flow Control: RX/TX
>  Was traffic sent over eth0? If yes, then it may have relevance for the
>  bug, for if the cable is disconnected in the middle of a connection,
>  the situation changes.
>

I don't know what happened in this moment, since nobody unplug the cable, AFAIK.

>
>
>  | [ 4853.013621] dccp_ackvec_add_new: Warning: large burst loss (583)
>
> | [ 5690.125731] dccp_ackvec_add_new: Warning: large burst loss (943)
>
> | [ 5747.246785] dccp_ackvec_add_new: Warning: large burst loss (76)
>
> | [ 6301.420651] dccp_ackvec_add_new: Warning: large burst loss (77)
>  I have collated these messages - there is quite a lot of loss going on
>  and I am wondering whether the Ack Vectors are properly dimensioned.

Hum... this is one thing to check. For my situation (wifi and ethernet
link), what do you suggest to use. To properly dimension Ack vectors,
do I need just to set the parameters like that?:

sudo sysctl -w net.dccp.default.seq_window=10000
sudo sysctl -w net.dccp.default.send_ackvec=1

>  This is independent of the bug report: in any case, the warning message
>  says that multiple losses will now be compressed into cells of 63 lost
>  packets each, instead of occupying one cell each (which would otherwise
>  almost certainly lead to overflow).
>
>  Back to the bug report:
>
>
>  | [ 6286.655846] ccid2_change_l_ack_ratio: Limiting Ack Ratio (4) to 2
>  | [ 6286.864021] BUG: Failed negotiation in state 2 at
>  | net/dccp/feat.c:1370/dccp_feat_activate_values()
>  | [ 6286.864034] Pid: 0, comm: swapper Not tainted 2.6.25-rc3 #4
>  | [ 6286.864034]  [<c045c6bb>] dccp_feat_activate_values+0x7b/0x200
>  | [ 6286.864034]  [<c045f919>] dccp_create_openreq_child+0xb9/0x100
>  <snip>
>
> | [ 6288.714624] ccid2_change_l_ack_ratio: Limiting Ack Ratio (4) to 2
>  The "Limiting Ack Ratio (4) to 2" must be from a different flow, since
>  it was before the BUG message and continued afterwards.

Probably, I setup iperf with -P 3 for DCCP

>
>  Now I see what you meant by `stopped': if feature negotiation fails,
>  the thread indeed stops, since otherwise it would mean to operate
>  with an undefined (unconfirmed) option value, which will otherwise
>  cause havoc. The server ends the connection with a reset (can you
>  remember what the client reported after that?).

I didn't check the client side log! :(

>
>  It is a pity that you don't have the tcpdump, this would allow to find
>  out which feature option could not be negotiated.

I'm sorry about that! :(

>
>  What I will do for the moment is to turn the dccp_pr_debug() under
>  the `activation_failed' into a DCCP_CRIT, so that it outputs the type
>  of the feature that has failed.

That's a good idea!

>
>  Please check the above questions.
>
>  Gerrit
>

Leandro.
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [IETF DCCP]     [Linux Networking]     [Git]     [Security]     [Linux Assembly]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux