Re: [PATCH] netfilter: xt_connlimit: fix race in connection counting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 27, 2018 at 11:38:47PM +0100, Florian Westphal wrote:
> Alakesh Haloi <alakeshh@xxxxxxxxxx> wrote:
> > > But In case you can't reproduce, its possible your patch is still needed
> > > for stable.
> > 
> > Thanks Florian! I have tested linus's tree and i do not see the issue happening
> > there. I have not been able to test nf.git yet. Do you suggest that I should
> > start working on backporting relevant patches from mainline or it should be
> > possible to apply this patch to stable branches directly?
> 
> The relevant mainline fix is probably
> b36e4523d4d56e2595e28f16f6ccf1cd6a9fc452
> ("netfilter: nf_conncount: fix garbage collection confirm race").
> 
> But
> 1. I don't like this fix (i could not come up with anything better...)
> 2. It will not apply to older stable branches.
> 
> So I think you might want to look at this commit, see if you have a
> better idea, and if not, apply similar strategy to older stable kernel,
> then pass this as a backport to stable maintainers.  I can review the
> patch.
I tried porting the fix to 4.14 kernel, mostly bringing in the concept of saving
the cpu number and looking at the age of the connection before deleting it. It 
seems to improve the situation but does not fix the problem entirely. The number
of connections that go beyond limit set, seems to be dependent on the number of
threads on the sender side that sends connection request. If i can improve it or
fix it entirely then i will send the patch out.

Second issue I wanted to bring in is, I tried latest linus's tree and ran my
experiments to create connections and bumped up the number of threads that
create connections, and i see kernel panic with list delete corruption. The
panic I am seeing is as below. So it looks like the refactor around xt_connlimit
may not be stable and needs more work.

[  259.988383] ------------[ cut here ]------------
[  259.989790] list_del corruption, ffff88cd473fbe18->prev is LIST_POISON2
(dead000000000200)
[  259.991999] WARNING: CPU: 3 PID: 0 at lib/list_debug.c:50
__list_del_entry_valid+0x92/0xa0
[  259.994160] Modules linked in: xt_connlimit nf_conncount nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nfit libnvdimm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd ppdev cryptd
glue_helper i2c_piix4 parport_pc i2c_core parport pcspkr ip_tables xfs libcrc32c
nvme serio_raw crc32c_intel ena nvme_core
[  260.001621] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.20.0-rc4+ #2
[  260.003343] Hardware name: Amazon EC2 m5d.xlarge/, BIOS 1.0 10/16/2017
[  260.005116] RIP: 0010:__list_del_entry_valid+0x92/0xa0
[  260.006541] Code: 31 c0 c3 48 89 fe 31 c0 48 c7 c7 68 83 8b a4 e8 f4 68 cc ff
0f 0b 31 c0 c3 48 89 fe 31 c0 48 c7 c7 30 83 8b a4 e8 de 68 cc ff <0f> 0b 31 c0
c3 90 90 90 90 90 90 90 90 90 41 55 48 85 d2 49 89 d5
[  260.011337] RSP: 0018:ffff88cd52b83940 EFLAGS: 00010286
[  260.012782] RAX: 0000000000000000 RBX: ffff88cd4f3c9018 RCX: 000000000000083f
[  260.014692] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[  260.019022] RBP: ffff88cd473fbe18 R08: 0000000000000000 R09: 0000000000000225
[  260.023354] R10: 0000000000000000 R11: ffff88cd52b836b0 R12: ffff88cd52b839b7
[  260.027740] R13: ffff88cd4f3c9018 R14: ffff88cd473fbe18 R15: ffffffffc00da501
[  260.032115] FS:  0000000000000000(0000) GS:ffff88cd52b80000(0000)
knlGS:0000000000000000
[  260.039137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  260.043161] CR2: 00007efc39289000 CR3: 000000040dbc4002 CR4: 00000000007606e0
[  260.047503] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  260.051891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  260.056215] PKRU: 55555554
[  260.059485] Call Trace:
[  260.062712]  <IRQ>
[  260.065815]  conn_free+0x29/0x90 [nf_conncount]
[  260.069558]  find_or_evict+0x5c/0x70 [nf_conncount]
[  260.073357]  nf_conncount_lookup+0xa7/0x350 [nf_conncount]
[  260.077346]  nf_conncount_count+0x28e/0x4ae [nf_conncount]
[  260.081295]  ? __fib_validate_source+0x11a/0x410
[  260.085187]  connlimit_mt+0x95/0x173 [xt_connlimit]
[  260.089014]  ? tcp_in_window+0xf8/0x890 [nf_conntrack]
[  260.092936]  ipt_do_table+0x264/0x650 [ip_tables]
[  260.096710]  nf_hook_slow+0x3d/0xb0
[  260.100178]  ? ip_route_input_noref+0x24/0x40
[  260.103834]  ip_local_deliver+0xcc/0xe0
[  260.107465]  ? ip_sublist_rcv_finish+0x70/0x70
[  260.111229]  ip_rcv+0x52/0xd0
[  260.114626]  ? ip_rcv_finish_core.isra.13+0x370/0x370
[  260.118509]  __netif_receive_skb_one_core+0x52/0x70
[  260.122326]  netif_receive_skb_internal+0x42/0xf0
[  260.126074]  napi_gro_receive+0xbf/0xe0
[  260.129619]  ena_clean_rx_irq+0x2c4/0x7e0 [ena]
[  260.133372]  ? kmsg_dump+0xa1/0xe0
[  260.136817]  ena_io_poll+0x430/0x8b0 [ena]
[  260.140445]  net_rx_action+0x297/0x3c0
[  260.143958]  __do_softirq+0xd6/0x2a9
[  260.147436]  irq_exit+0xdb/0xf0
[  260.150860]  do_IRQ+0x54/0xe0
[  260.154188]  common_interrupt+0xf/0xf
[  260.157718]  </IRQ>
[  260.160839] RIP: 0010:native_safe_halt+0x2/0x10
[  260.164572] Code: e1 5b ff ff ff 7f c3 f3 c3 65 48 8b 04 25 80 5c 01 00 f0 80
48 02 20 48 8b 00 a8 08 74 8a eb c0 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00
66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
[  260.176697] RSP: 0018:ffffb77b0190beb0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffd8
[  260.183599] RAX: ffffffffa41f5b60 RBX: ffff88ca4625bc00 RCX: 0000000000000001
[  260.188043] RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000003
[  260.192457] RBP: 0000000000000003 R08: 00000000cccccccc R09: ffffffffa4624f65
[  260.196868] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  260.201250] R13: 0000000000000000 R14: ffff88ca4625bc00 R15: ffff88ca4625bc00
[  260.205625]  ? mwait_idle+0x1e0/0x1e0
[  260.209149]  default_idle+0x1a/0x140
[  260.212740]  do_idle+0x1a6/0x290
[  260.216174]  cpu_startup_entry+0x19/0x20
[  260.219763]  start_secondary+0x1aa/0x200
[  260.223483]  secondary_startup_64+0xa4/0xb0
[  260.227257] ---[ end trace fc24593acc754b3b ]---



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux