Re: Kernel panic: Route cache, RCU, possibly FIB trie.

Dipankar Sarma <dipankar@xxxxxxxxxx> · Tue, 21 Mar 2006 03:39:56 +0530

On Mon, Mar 20, 2006 at 10:44:21PM +0100, Jesper Dangaard Brouer wrote:
> 
> Kernel panic report.
> 
> Have experienced some kernel panic's on a production Linux box acting
> as a router for a large number of customers.
> 
> I have tried to track down the problem, and I think I have narrowed it
> a bit down.  My theory is that it is related to the route cache
> (ip_dst_cache) or FIB, which cannot dealloacate route cache slab
> elements (maybe RCU related).  (I have seen my route cache increase to
> around 520k entries using rtstat, before dying).
> 
> I'm using the FIB trie system/algorithm (CONFIG_IP_FIB_TRIE). Think
> that the error might be cause by the "fib_trie" code.  See the syslog,
> output below.
> 
> Below are some kernel panic outputs from the console and some
> interesting errors found in syslog.
> 
> Kernel panic#1
> --------------
> EIP is at _stext+0x3feffd68/0x49
>        c03f7380
> Call Trace:
>  [<c0103cc7>] show_stack+0x80/0x96
>  [<c0103e60>] show_registers+0x161/0x1c5
>  [<c0104057>] die+0x107/0x186
>  [<c0116c5f>] do_page_fault+0x2c6/0x57d
>  [<c0103997>] error_code+0x4f/0x54
>  [<c012fe7b>] __rcu_process_callbacks+0xaa/0xd3
>  [<c012feff>] rcu_process_callbacks+0x5b/0x65
>  [<c0124578>] tasklet_action+0x77/0xc9
>  [<c01241f1>] __do_softirq+0xc1/0xd6
>  [<c0124251>] do_softirq+0x4b/0x4d
>  [<c012433b>] irq_exit+0x47/0x49
>  [<c010533b>] do_IRQ+0x2b/0x3b
>  [<c010383e>] common_interrupt+0x1a/0x20
> Code:  Bad EIP value.
>  <0>Kernel panic - not syncing: Fatal exception in interrupt

Bad eip in processing rcu callback often indicates that the object
that embeds the rcu_head has already been freed. Can you enable
slab debugging and see if this can be detected there in a different
path ?

Thanks
Dipankar

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html