On Wed, 30 Mar 2005, Herbert Xu wrote:
On Tue, Mar 29, 2005 at 01:49:26PM +0200, Ingo Molnar wrote:
(i guess the debug message should be extended to do a dump_stack() so that we see which process does?)
Never mind. I think I've found what it is. The only thing I can't figure out is why we're only seeing it now when this bug has been around since day one.
In netlink_dump we're operating on sk after dropping the cb lock. This is racy because the owner of the socket could close it after we drop the cb lock.
This is possible because netlink_dump isn't always called from the context of the process that owns the socket. For instance, if there is contention on rtnl then rtnetlink requests will be processed by the process that owns the rtnl.
The solution is to hold a ref count on the socket before we drop the cb lock.
OK. I'm no longer able to trigger this error. And the patch is already in the linux-2.6 repository. Thank you.
Best regards,
Krzysztof Olędzki