On Wed, Mar 30, 2005 at 05:02:36PM -0800, David S. Miller wrote: > > Looks like 2.4.x needs the same fix, correct? Indeed it does. Here it is for 2.4. In netlink_dump we're operating on sk after dropping the cb lock. This is racy because the owner of the socket could close it after we drop the cb lock. This is possible because netlink_dump isn't always called from the context of the process that owns the socket. For instance, if there is contention on rtnl then rtnetlink requests will be processed by the process that owns the rtnl. The solution is to hold a ref count on the socket before we drop the cb lock. Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> I think I know why we're only seeing it now. Without preemption this race is very unlikely to trigger. This plus the fact that we now have a lot more netlink applications probably made it just a tad more likely. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
===== net/netlink/af_netlink.c 1.21 vs edited ===== --- 1.21/net/netlink/af_netlink.c 2005-02-17 06:21:57 +11:00 +++ edited/net/netlink/af_netlink.c 2005-03-31 11:07:40 +10:00 @@ -981,9 +981,11 @@ len = cb->dump(skb, cb); if (len > 0) { + sock_hold(sk); spin_unlock(&sk->protinfo.af_netlink->cb_lock); skb_queue_tail(&sk->receive_queue, skb); sk->data_ready(sk, len); + sock_put(sk); return 0; }