Re: [PATCH] tcp: Fix a connect() race with timewait sockets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
> Date: Tue, 01 Dec 2009 16:00:39 +0100
> 
>> [PATCH] tcp: Fix a connect() race with timewait sockets
> 
> This condition would only trigger if the timewait recycling sysctl is
> enabled.
> 
> It is off by default, and I can't find any mention in this bug report
> that it has been turned on.

Very true. I know nothing about context of the reporter, he didnt
answered to my queries.

Yes, if sysctl_tw_reuse is set, bug can triggers without any extra conditions.

But even if sysctl_tw_reuse is cleared, we might trigger the bug if
local port is bound to a value.

[User application called bind( port=XXX) before connect() ]


__inet_hash_connect() can indeed call check_established(... twp = NULL)

...
        head = &hinfo->bhash[inet_bhashfn(net, snum, hinfo->bhash_size)];
        tb  = inet_csk(sk)->icsk_bind_hash;
        spin_lock_bh(&head->lock);
        if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) {
                hash(sk);
                spin_unlock_bh(&head->lock);
                return 0;
        } else {
                spin_unlock(&head->lock);
                /* No definite answer... Walk to established hash table */
                ret = check_established(death_row, sk, snum, NULL);         <<< HERE >>>
out:
                local_bh_enable();
                return ret;
        }



In this case, we call tcp_twsk_unique() with twp = NULL,
this bypass the sysctl_tcp_tw_reuse test.


int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
{
        const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
        struct tcp_sock *tp = tcp_sk(sk);

        /* With PAWS, it is safe from the viewpoint
           of data integrity. Even without PAWS it is safe provided sequence
           spaces do not overlap i.e. at data rates <= 80Mbit/sec.

           Actually, the idea is close to VJ's one, only timestamp cache is
           held not per host, but per port pair and TW bucket is used as state
           holder.

           If TW bucket has been already destroyed we fall back to VJ's scheme
           and use initial timestamp retrieved from peer table.
         */
        if (tcptw->tw_ts_recent_stamp &&
<<HERE>>       (twp == NULL || (sysctl_tcp_tw_reuse &&
                             get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux