On 01/23/2013 04:01 PM, Eric Dumazet wrote:
On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
On 01/22/2013 06:32 PM, Ben Greear wrote:
So, I'm slowly making some progress. I've verified that the skb
has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
method. I'm trying to track it backwards and figure out which
device it belongs to, etc....takes a while to reproduce though.
One thing about this stack trace below...the dev_seq_stop() does
a rcu read-unlock. Now, I can't figure out exactly how ip_rcv()
can cause dev_seq_stop() to run, but if this stack is legit,
then maybe by the time we enter the ip_rcv_finish() code we are
running without rcu_readlock() held?
If so, that would probably explain the bug.
The whole thing is run under rcu_read_lock() done in
__netif_receive_skb()
I was worried that the dev_seq_stop might be called
incorrectly causing an asymetric unlock. I have no
idea how that might happened, but several crashes
have that dev_seq_stop method listed, so it got me suspicious.
My suspicion was that we called netif_rx() from macvlan leaving a
not refcounted skb dst.
But the patch I sent to you didnt solve the bug, so its something else.
You could trace at which point the dst was released. (where you set
dst->input/output to deadbeef)
My current code is in some garbage collector timer code, but I can
work on saving the call-site that first pokes the dst into the
garbage collection list...
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html