Re: crash in death_by_timeout()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, Nov 18, 2008 at 02:27:44PM +0100, Patrick McHardy wrote:
> Could you try whether this patch fixes the problem?
>
> Pablo, do you recall the reason why the lock isn't held in
> ctnetlink_create_conntrack()?

> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 622d7c6..233fdd2 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -305,9 +305,7 @@ void nf_conntrack_hash_insert(struct nf_conn *ct)
>  	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
>  	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
>  
> -	spin_lock_bh(&nf_conntrack_lock);
>  	__nf_conntrack_hash_insert(ct, hash, repl_hash);
> -	spin_unlock_bh(&nf_conntrack_lock);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
>  
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index a040d46..3b009a3 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1090,7 +1090,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
>  	struct nf_conn_help *help;
>  	struct nf_conntrack_helper *helper;
>  
> -	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_KERNEL);
> +	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_ATOMIC);
>  	if (ct == NULL || IS_ERR(ct))
>  		return -ENOMEM;
>  
> @@ -1212,13 +1212,14 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
>  			atomic_inc(&master_ct->ct_general.use);
>  		}
>  
> -		spin_unlock_bh(&nf_conntrack_lock);
>  		err = -ENOENT;
>  		if (nlh->nlmsg_flags & NLM_F_CREATE)
>  			err = ctnetlink_create_conntrack(cda,
>  							 &otuple,
>  							 &rtuple,
>  							 master_ct);
> +		spin_unlock_bh(&nf_conntrack_lock);
> +
>  		if (err < 0 && master_ct)
>  			nf_ct_put(master_ct);
>  

We didn't see any kernel crashes during a half day heavy work (without the
patch the kernel crashed in 3-4 hours every time), but we found a lot of
BUG messages in the log (maybe for every new entry):

    Nov 24 14:45:43 test kernel: BUG: sleeping function called from invalid context at mm/slab.c:3043
    Nov 24 14:45:43 test kernel: in_atomic():1, irqs_disabled():0
    Nov 24 14:45:43 test kernel: 3 locks held by test/3586:
    Nov 24 14:45:43 test kernel:  #0:  (nfnl_mutex){--..}, at: [<d081500f>] nfnetlink_rcv+0xf/0x30 [nfnetlink]
    Nov 24 14:45:43 test kernel:  #1:  (nf_conntrack_lock){-+..}, at: [<d08c979f>] ctnetlink_new_conntrack+0x7f/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:43 test kernel:  #2:  (rcu_read_lock){..--}, at: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:43 test kernel: Pid: 3586, comm: test Not tainted 2.6.27.6bozotest #1
    Nov 24 14:45:43 test kernel:  [<c027a566>] __kmalloc_track_caller+0x126/0x160
    Nov 24 14:45:43 test kernel:  [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
    Nov 24 14:45:43 test kernel:  [<c026411d>] __krealloc+0x5d/0x80
    Nov 24 14:45:44 test kernel:  [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
    Nov 24 14:45:44 test kernel:  [<c052a71d>] __nf_ct_ext_add+0x2d/0x290
    Nov 24 14:45:44 test kernel:  [<d08c9af8>] ctnetlink_new_conntrack+0x3d8/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:44 test kernel:  [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:44 test kernel:  [<c0248910>] validate_chain+0x380/0xed0
    Nov 24 14:45:44 test kernel:  [<d0815220>] nfnetlink_rcv_msg+0xf0/0x180 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<d0815130>] nfnetlink_rcv_msg+0x0/0x180 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<c0520ebc>] netlink_rcv_skb+0x7c/0xa0
    Nov 24 14:45:44 test kernel:  [<d081501b>] nfnetlink_rcv+0x1b/0x30 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<c0520c50>] netlink_unicast+0x250/0x280
    Nov 24 14:45:44 test kernel:  [<c052145e>] netlink_sendmsg+0x1ee/0x2c0
    Nov 24 14:45:44 test kernel:  [<c04fad7f>] sock_sendmsg+0xbf/0xf0
    Nov 24 14:45:44 test kernel:  [<c02496e5>] __lock_acquire+0x285/0x9e0
    Nov 24 14:45:44 test kernel:  [<c0239790>] autoremove_wake_function+0x0/0x50
    Nov 24 14:45:44 test kernel:  [<c0248910>] validate_chain+0x380/0xed0
    Nov 24 14:45:44 test kernel:  [<c027ee33>] fget_light+0xd3/0xf0
    Nov 24 14:45:44 test kernel:  [<c031bea8>] copy_from_user+0x38/0x80
    Nov 24 14:45:44 test kernel:  [<c031bea8>] copy_from_user+0x38/0x80
    Nov 24 14:45:44 test kernel:  [<c0502e2a>] verify_iovec+0x2a/0x90
    Nov 24 14:45:44 test kernel:  [<c04faf14>] sys_sendmsg+0x164/0x280
    Nov 24 14:45:44 test kernel:  [<c027ee33>] fget_light+0xd3/0xf0
    Nov 24 14:45:44 test kernel:  [<c031c16a>] copy_to_user+0x3a/0x70
    Nov 24 14:45:44 test kernel:  [<c04fb98f>] move_addr_to_user+0x5f/0x70
    Nov 24 14:45:44 test kernel:  [<c04fbf0d>] sys_getsockname+0xcd/0xd0
    Nov 24 14:45:44 test kernel:  [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
    Nov 24 14:45:44 test kernel:  [<c0247e64>] trace_hardirqs_on_caller+0xc4/0x140
    Nov 24 14:45:44 test kernel:  [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
    Nov 24 14:45:44 test kernel:  [<c04fe578>] sock_setsockopt+0x128/0x590
    Nov 24 14:45:44 test kernel:  [<c027edb3>] fget_light+0x53/0xf0
    Nov 24 14:45:44 test kernel:  [<c04fa552>] sockfd_lookup_light+0x32/0x60
    Nov 24 14:45:44 test kernel:  [<c04fc39b>] sys_socketcall+0x25b/0x2b0
    Nov 24 14:45:44 test kernel:  [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
    Nov 24 14:45:44 test kernel:  [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
    Nov 24 14:45:44 test kernel:  [<c0203029>] sysenter_do_call+0x12/0x35
    Nov 24 14:45:44 test kernel:  =======================

Bye,
Zoltan
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux