On Fri, Jun 29, 2012 at 8:17 AM, Julian Anastasov <ja@xxxxxx> wrote: > > Hello, > > On Thu, 28 Jun 2012, Xiaotian Feng wrote: > >> We met a kernel panic in 2.6.32.43 kernel: >> >> [2680191.848044] IPVS: ip_vs_conn_hash(): request for already hashed, called from run_timer_softirq+0x175/0x1d0 >> <snip> >> [2680311.849009] general protection fault: 0000 [#1] SMP >> [2680311.853001] RIP: 0010:[<ffffffff815f155c>] [<ffffffff815f155c>] ip_vs_conn_expire+0xdc/0x2f0 >> [2680311.853001] RSP: 0018:ffff880028303e70 EFLAGS: 00010202 >> [2680311.853001] RAX: dead000000200200 RBX: ffff8801aad00b80 RCX: 0000000000001d90 >> [2680311.853001] RDX: dead000000100100 RSI: 000000004fd59800 RDI: ffff8801aad00c08 >> <snip> >> [2680311.853001] Call Trace: >> [2680311.853001] <IRQ> >> [2680311.853001] [<ffffffff815f1480>] ? ip_vs_conn_expire+0x0/0x2f0 >> [2680311.853001] [<ffffffff8104e2a5>] run_timer_softirq+0x175/0x1d0 >> [2680311.853001] [<ffffffff81021a48>] ? lapic_next_event+0x18/0x20 >> [2680311.853001] [<ffffffff81049a13>] __do_softirq+0xb3/0x150 >> [2680311.853001] [<ffffffff8100cc5c>] call_softirq+0x1c/0x30 >> [2680311.853001] [<ffffffff8100ea9a>] do_softirq+0x4a/0x80 >> [2680311.853001] [<ffffffff81049957>] irq_exit+0x77/0x80 >> [2680311.853001] [<ffffffff81021f2c>] smp_apic_timer_interrupt+0x6c/0xa0 >> [2680311.853001] [<ffffffff8100c633>] apic_timer_interrupt+0x13/0x20 >> [2680311.853001] <EOI> >> [2680311.853001] [<ffffffff81013b52>] ? mwait_idle+0x52/0x70 >> [2680311.853001] [<ffffffff8100a7b0>] ? enter_idle+0x20/0x30 >> [2680311.853001] [<ffffffff8100ac62>] ? cpu_idle+0x52/0x80 >> [2680311.853001] [<ffffffff816d504d>] ? start_secondary+0x19d/0x280 >> >> rax and rdx is LIST_POISON1 and LIST_POISON2, so kernel is list_del() on an already deleted >> connection and result the general protect fault. >> >> The "request for already hashed" warning, told us someone might change the connection flags >> incorrectly, like described in commit aea9d711, it changes the connection flags, but doesn't >> put the connection back to the list. So ip_vs_conn_hash() throw a warning and return. >> Later, when ip_vs_conn_expire fire again, ip_vs_conn_unhash() will find the HASHED connection >> and list_del() it, then kernel panic happened. >> >> After code review, the only chance that kernel change connection flag without protection is >> in ip_vs_ftp_init_conn(). > > Hm, ip_vs_ftp_init_conn is called before 1st hashing, > from ip_vs_bind_app() in ip_vs_conn_new() before > ip_vs_conn_hash(). It should be another problem with > the flags. How different is IPVS in 2.6.32.43 compared to > recent kernels? If commit aea9d711 is present, I'm not > aware of other similar problems. ip_vs_bind_app() is also called by ip_vs_try_bind_dest(), which can be traced to ip_vs_proc_conn(). I've checked the changes in upstream, but nothing helps since aea9d711 has been taken into 2.6.32.28 kernel. > > May be you can post some details for your setup on > lvs-devel@xxxxxxxxxxxxxxx. I assume FTP is used, > what about master-backup sync? Can you confirm that > this fix solves the problem or it happens in rare cases? Yes, ftp and WRR scheduler is used. Unfortunately, the oops is unreproducible :( > >> Signed-off-by: Xiaotian Feng <dannyfeng@xxxxxxxxxxx> >> Cc: Wensong Zhang <wensong@xxxxxxxxxxxx> >> Cc: Simon Horman <horms@xxxxxxxxxxxx> >> Cc: Julian Anastasov <ja@xxxxxx> >> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> >> Cc: Patrick McHardy <kaber@xxxxxxxxx> >> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> >> --- >> net/netfilter/ipvs/ip_vs_ftp.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c >> index b20b29c..c2bc264 100644 >> --- a/net/netfilter/ipvs/ip_vs_ftp.c >> +++ b/net/netfilter/ipvs/ip_vs_ftp.c >> @@ -65,8 +65,10 @@ static int ip_vs_ftp_pasv; >> static int >> ip_vs_ftp_init_conn(struct ip_vs_app *app, struct ip_vs_conn *cp) >> { >> + spin_lock(&cp->lock); >> /* We use connection tracking for the command connection */ >> cp->flags |= IP_VS_CONN_F_NFCT; >> + spin_unlock(&cp->lock); >> return 0; >> } >> >> -- >> 1.7.1 > > Regards > > -- > Julian Anastasov <ja@xxxxxx> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html