Hello, On Thu, 28 Jun 2012, Xiaotian Feng wrote: > We met a kernel panic in 2.6.32.43 kernel: > > [2680191.848044] IPVS: ip_vs_conn_hash(): request for already hashed, called from run_timer_softirq+0x175/0x1d0 > <snip> > [2680311.849009] general protection fault: 0000 [#1] SMP > [2680311.853001] RIP: 0010:[<ffffffff815f155c>] [<ffffffff815f155c>] ip_vs_conn_expire+0xdc/0x2f0 > [2680311.853001] RSP: 0018:ffff880028303e70 EFLAGS: 00010202 > [2680311.853001] RAX: dead000000200200 RBX: ffff8801aad00b80 RCX: 0000000000001d90 > [2680311.853001] RDX: dead000000100100 RSI: 000000004fd59800 RDI: ffff8801aad00c08 > <snip> > [2680311.853001] Call Trace: > [2680311.853001] <IRQ> > [2680311.853001] [<ffffffff815f1480>] ? ip_vs_conn_expire+0x0/0x2f0 > [2680311.853001] [<ffffffff8104e2a5>] run_timer_softirq+0x175/0x1d0 > [2680311.853001] [<ffffffff81021a48>] ? lapic_next_event+0x18/0x20 > [2680311.853001] [<ffffffff81049a13>] __do_softirq+0xb3/0x150 > [2680311.853001] [<ffffffff8100cc5c>] call_softirq+0x1c/0x30 > [2680311.853001] [<ffffffff8100ea9a>] do_softirq+0x4a/0x80 > [2680311.853001] [<ffffffff81049957>] irq_exit+0x77/0x80 > [2680311.853001] [<ffffffff81021f2c>] smp_apic_timer_interrupt+0x6c/0xa0 > [2680311.853001] [<ffffffff8100c633>] apic_timer_interrupt+0x13/0x20 > [2680311.853001] <EOI> > [2680311.853001] [<ffffffff81013b52>] ? mwait_idle+0x52/0x70 > [2680311.853001] [<ffffffff8100a7b0>] ? enter_idle+0x20/0x30 > [2680311.853001] [<ffffffff8100ac62>] ? cpu_idle+0x52/0x80 > [2680311.853001] [<ffffffff816d504d>] ? start_secondary+0x19d/0x280 > > rax and rdx is LIST_POISON1 and LIST_POISON2, so kernel is list_del() on an already deleted > connection and result the general protect fault. > > The "request for already hashed" warning, told us someone might change the connection flags > incorrectly, like described in commit aea9d711, it changes the connection flags, but doesn't > put the connection back to the list. So ip_vs_conn_hash() throw a warning and return. > Later, when ip_vs_conn_expire fire again, ip_vs_conn_unhash() will find the HASHED connection > and list_del() it, then kernel panic happened. > > After code review, the only chance that kernel change connection flag without protection is > in ip_vs_ftp_init_conn(). > > Signed-off-by: Xiaotian Feng <dannyfeng@xxxxxxxxxxx> > Cc: Wensong Zhang <wensong@xxxxxxxxxxxx> > Cc: Simon Horman <horms@xxxxxxxxxxxx> > Cc: Julian Anastasov <ja@xxxxxx> > Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > Cc: Patrick McHardy <kaber@xxxxxxxxx> > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> For the fix below: Acked-by: Julian Anastasov <ja@xxxxxx> Simon, the change looks ok. ip_vs_ftp_init_conn is called from context where cp->lock is not locked (no double lock), so it should be safe for the backup. Only that the comment is not specifying that we fix a problem in the backup server. > --- > net/netfilter/ipvs/ip_vs_ftp.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c > index b20b29c..c2bc264 100644 > --- a/net/netfilter/ipvs/ip_vs_ftp.c > +++ b/net/netfilter/ipvs/ip_vs_ftp.c > @@ -65,8 +65,10 @@ static int ip_vs_ftp_pasv; > static int > ip_vs_ftp_init_conn(struct ip_vs_app *app, struct ip_vs_conn *cp) > { > + spin_lock(&cp->lock); > /* We use connection tracking for the command connection */ > cp->flags |= IP_VS_CONN_F_NFCT; > + spin_unlock(&cp->lock); > return 0; > } > > -- > 1.7.1 Regards -- Julian Anastasov <ja@xxxxxx> -- To unsubscribe from this list: send the line "unsubscribe netfilter" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html