Re: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (126)

Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> · Thu, 31 Mar 2005 11:10:01 +1000

On Wed, Mar 30, 2005 at 05:02:36PM -0800, David S. Miller wrote:
> 
> Looks like 2.4.x needs the same fix, correct?

Indeed it does.  Here it is for 2.4.

In netlink_dump we're operating on sk after dropping the cb lock.
This is racy because the owner of the socket could close it after
we drop the cb lock.

This is possible because netlink_dump isn't always called from the
context of the process that owns the socket.  For instance, if there
is contention on rtnl then rtnetlink requests will be processed by
the process that owns the rtnl.

The solution is to hold a ref count on the socket before we drop
the cb lock.

Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>

I think I know why we're only seeing it now.  Without preemption
this race is very unlikely to trigger.  This plus the fact that
we now have a lot more netlink applications probably made it
just a tad more likely.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
===== net/netlink/af_netlink.c 1.21 vs edited =====

--- 1.21/net/netlink/af_netlink.c	2005-02-17 06:21:57 +11:00
+++ edited/net/netlink/af_netlink.c	2005-03-31 11:07:40 +10:00
@@ -981,9 +981,11 @@
 	len = cb->dump(skb, cb);
 
 	if (len > 0) {
+		sock_hold(sk);
 		spin_unlock(&sk->protinfo.af_netlink->cb_lock);
 		skb_queue_tail(&sk->receive_queue, skb);
 		sk->data_ready(sk, len);
+		sock_put(sk);
 		return 0;
 	}