Re: 4.19.x kernels oops in nf_conncount_destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This morning I found this thread, which I didn't see last night. I'm
not sure how I missed it, since I knew what I was searching for. It
includes a link to the same patches as I mentioned, but with a status
filter in the URL such that I can see the patches.

I applied the three patches and tested and it does NOT fix the problem
for me. It changes the behavior somewhat -- I saw several oopses (or
other noise) scroll past before it locked up. It also ended with
something like "eth0: pcnet32 transmit timed out", which I hadn't seen
before.

https://www.spinics.net/lists/netfilter-devel/msg57045.html

https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=*



Todd Eigenschink writes:
>EPILOGUE-AS-PREAMBLE:
>
>I had already typed most of this when I thought to search the
>netfilter-devel archive. I found this, which sounds an awful lot like
>my issue:
>
>https://www.spinics.net/lists/netfilter-devel/msg56882.html
>
>However, the patch link in the first followup seems empty, so I can't
>verify that it's the same thing or that the proposed fix works for me.
>
>
>----------------------------------------------------------------------
>
>[1.] One line summary of the problem:
>
>4.19.x kernels oops in nf_conncount_destroy.
>
>
>[2.] Full description of the problem/report:
>
>We have been running 4.18.x kernels, up through 4.18.20, in production
>for a small web/email hosting operation with no issues. Everything
>relevant here is 32-bit Linux on VMware ESXi. Upon the release of
>4.18.20 and knowing that it was EOL, I stepped to then-current 4.19.4.
>
>One of our machines (a mail gateway) hung with an oops within a minute
>or two of boot. I rolled it back to deal with later.
>
>The next morning, another machine (coincidentally another mail
>gateway) crashed as well, and the tail end of the oops--that I could
>see on the 80x25 console--looked similar to what I remembered from the
>first. I rolled it back. If a third one happened, I was going to roll
>them all back. No other machines had issues.
>
>When 4.19.5 was released, I tried that, with the same effect, so I
>decided that since the fastest-crashing machine was, while production,
>not going to cause user-visible issues, I'd bisect to try to hunt down
>the cause. Every other machine, about 30 total, has been fine on
>4.19.4 / 4.19.5.
>
>Bisecting led me to this. 
>
>
>5c789e131cbb997a528451564ea4613e812fc718 is the first bad commit
>commit 5c789e131cbb997a528451564ea4613e812fc718
>Author: Yi-Hung Wei <yihung.wei@xxxxxxxxx>
>Date:   Mon Jul 2 17:33:44 2018 -0700
>
>    netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search
>    
>    This patch is originally from Florian Westphal.
>    
>    This patch does the following 3 main tasks.
>    
>    1) Add list lock to 'struct nf_conncount_list' so that we can
>    alter the lists containing the individual connections without holding the
>    main tree lock.  It would be useful when we only need to add/remove to/from
>    a list without allocate/remove a node in the tree.  With this change, we
>    update nft_connlimit accordingly since we longer need to maintain
>    a list lock in nft_connlimit now.
>    
>    2) Use RCU for the initial tree search to improve tree look up performance.
>    
>    3) Add a garbage collection worker. This worker is schedule when there
>    are excessive tree node that needed to be recycled.
>    
>    Moreover,the rbnode reclaim logic is moved from search tree to insert tree
>    to avoid race condition.
>    
>    Signed-off-by: Yi-Hung Wei <yihung.wei@xxxxxxxxx>
>    Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
>    Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
>
>:040000 040000 3117a9e5f5d91c55bfcb495ed0cf20aac47beb4c eb16c3c84edfa70268c651490dd5031a6474ca2d M	include
>:040000 040000 f69622ea9603500bc837f6348bc7ffe6e4edefda 8983dc24192abb1ae1925f023a495c39d171021c M	net
>
>
>And it makes perfect sense: Our only two machines that use
>nf_connlimit in their firewall configs are those two mail gateways. I
>imagine that the speed at which they oops has to do with their
>specific connlimit settings and how quickly they accumulate enough
>traffic to hit one of them.
>
>Oops details are below.
>
>
>[3.] Keywords (i.e., modules, networking, kernel):
>
>netfilter, nf_conncount, nf_connlimit
>
>
>[4.] Kernel information
>
>[4.1.] Kernel version (from /proc/version):
>
>[4.2.] Kernel .config file:
>
>grep = .config, net-related stuff only:
>
>
>CONFIG_NET=y
>CONFIG_NET_INGRESS=y
>CONFIG_PACKET=y
>CONFIG_UNIX=y
>CONFIG_XFRM=y
>CONFIG_XFRM_ALGO=y
>CONFIG_XFRM_USER=y
>CONFIG_XFRM_SUB_POLICY=y
>CONFIG_XFRM_IPCOMP=m
>CONFIG_NET_KEY=m
>CONFIG_INET=y
>CONFIG_IP_MULTICAST=y
>CONFIG_IP_ADVANCED_ROUTER=y
>CONFIG_IP_MULTIPLE_TABLES=y
>CONFIG_INET_AH=m
>CONFIG_INET_ESP=m
>CONFIG_INET_IPCOMP=m
>CONFIG_INET_XFRM_TUNNEL=m
>CONFIG_INET_TUNNEL=m
>CONFIG_INET_XFRM_MODE_TRANSPORT=m
>CONFIG_INET_XFRM_MODE_TUNNEL=m
>CONFIG_INET_XFRM_MODE_BEET=m
>CONFIG_TCP_CONG_CUBIC=y
>CONFIG_DEFAULT_TCP_CONG="cubic"
>CONFIG_NET_PTP_CLASSIFY=y
>CONFIG_NETFILTER=y
>CONFIG_NETFILTER_ADVANCED=y
>CONFIG_NETFILTER_INGRESS=y
>CONFIG_NETFILTER_NETLINK=y
>CONFIG_NETFILTER_FAMILY_ARP=y
>CONFIG_NF_CONNTRACK=y
>CONFIG_NF_LOG_COMMON=y
>CONFIG_NETFILTER_CONNCOUNT=y
>CONFIG_NF_CONNTRACK_MARK=y
>CONFIG_NF_CONNTRACK_PROCFS=y
>CONFIG_NF_CONNTRACK_TIMEOUT=y
>CONFIG_NF_CONNTRACK_FTP=y
>CONFIG_NF_CT_NETLINK=y
>CONFIG_NF_CT_NETLINK_TIMEOUT=y
>CONFIG_NF_NAT=y
>CONFIG_NF_NAT_NEEDED=y
>CONFIG_NF_NAT_FTP=y
>CONFIG_NF_NAT_REDIRECT=y
>CONFIG_NF_TABLES=y
>CONFIG_NFT_CT=y
>CONFIG_NFT_CONNLIMIT=y
>CONFIG_NFT_LOG=y
>CONFIG_NFT_LIMIT=y
>CONFIG_NFT_MASQ=y
>CONFIG_NFT_NAT=y
>CONFIG_NFT_REJECT=y
>CONFIG_NF_FLOW_TABLE=m
>CONFIG_NETFILTER_XTABLES=y
>CONFIG_NETFILTER_XT_MARK=y
>CONFIG_NETFILTER_XT_CONNMARK=y
>CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
>CONFIG_NETFILTER_XT_TARGET_LOG=y
>CONFIG_NETFILTER_XT_TARGET_MARK=y
>CONFIG_NETFILTER_XT_NAT=y
>CONFIG_NETFILTER_XT_TARGET_NETMAP=y
>CONFIG_NETFILTER_XT_TARGET_REDIRECT=y
>CONFIG_NETFILTER_XT_TARGET_TPROXY=m
>CONFIG_NETFILTER_XT_MATCH_COMMENT=y
>CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
>CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
>CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
>CONFIG_NETFILTER_XT_MATCH_ESP=m
>CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
>CONFIG_NETFILTER_XT_MATCH_HELPER=y
>CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
>CONFIG_NETFILTER_XT_MATCH_LENGTH=y
>CONFIG_NETFILTER_XT_MATCH_LIMIT=y
>CONFIG_NETFILTER_XT_MATCH_MARK=y
>CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
>CONFIG_NETFILTER_XT_MATCH_POLICY=y
>CONFIG_NETFILTER_XT_MATCH_STATE=y
>CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
>CONFIG_NETFILTER_XT_MATCH_STRING=m
>CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
>CONFIG_NF_DEFRAG_IPV4=y
>CONFIG_NF_CONNTRACK_IPV4=y
>CONFIG_NF_TPROXY_IPV4=m
>CONFIG_NF_TABLES_IPV4=y
>CONFIG_NFT_CHAIN_ROUTE_IPV4=y
>CONFIG_NFT_REJECT_IPV4=y
>CONFIG_NF_TABLES_ARP=y
>CONFIG_NF_FLOW_TABLE_IPV4=m
>CONFIG_NF_LOG_IPV4=y
>CONFIG_NF_REJECT_IPV4=y
>CONFIG_NF_NAT_IPV4=y
>CONFIG_NFT_CHAIN_NAT_IPV4=y
>CONFIG_NF_NAT_MASQUERADE_IPV4=y
>CONFIG_NFT_MASQ_IPV4=y
>CONFIG_IP_NF_IPTABLES=y
>CONFIG_IP_NF_FILTER=y
>CONFIG_IP_NF_TARGET_REJECT=y
>CONFIG_IP_NF_NAT=y
>CONFIG_IP_NF_TARGET_MASQUERADE=y
>CONFIG_IP_NF_TARGET_NETMAP=y
>CONFIG_IP_NF_TARGET_REDIRECT=y
>CONFIG_IP_NF_MANGLE=y
>
>
>[5.] Most recent kernel version which did not have the bug:
>
>4.18.x is fine. 4.19+ all have it.
>
>
>[6.] Output of Oops.. message (if applicable) with symbolic information
>     resolved (see Documentation/admin-guide/bug-hunting.rst)
>
>For most oopses, all I have is the tail 80x25 of the output since I
>can't scroll the console back. A lot of them had call traces that
>included bits like:
>
>EIP: native_safe_halt+0x5/0x7
>[...]
> ? siphash_3u64+[...]
> default_idle+[...]
> arch_cpu_idle+[...]
> [...]
> 
>as well as some IRQ stuff, which really made no sense to me. Then, one
>or two bisect steps from the end, I had one that didn't lock up the
>machine, so I could scroll back:
>
>BUG: unable to handle kernel NULL pointer dereference at 00000000
>*pdpt = 00000000712fd001 *pde = 0000000000000000
>Oops: 0000 [#1] SMP
>CPU: 1 PID 26422 Comm: iptables Not tainted 4.18.0-rc3-00851-ged07d9a021df #22
>Hardware name: VMware, Inc. VMware Virtual Platform/400BX Desktop Reference Platform, BIOS 6.00 09/30/2014
>EIP: nf_conncount_destroy+0x4d/0xa5
>Code: ed 4c ff ff 89 f8 83 c7 04 05 04 04 00 00 89 45 ec 89 f8 [...]
>[...]
>Call Trace:
> connlimit_mt_destroy+0x14/0x16
> cleanup_match+0x34/0x52
> cleanup_entry+0x2e/0x8b
> do_ipt_set_ctl+0x412/0x48e
> ? do_ipt_get_ctl+0x39e/0x39e
> nf_setsockopt+0x37/0x57
> ip_setsockopt+0x4b/0x5a
> [and so on back to entry_SYSENTER_32]
>
>
>Complete screen shots are available if they'll be of any use.
>
>
>
>[7.] A small shell script or example program which triggers the
>     problem (if possible)
>
>When I saw "conncount", and knowing that it was our two mail gateways,
>my thoughts (above) jumped to our connlimit settings.
>
>HOWEVER. The oops says it was triggered by iptables. Both machines
>also use sshguard, which uses iptables to add DROP rules to a chain.
>(Nearly all our machines use sshguard, but the two mail gateways are
>the only two that give it more than occasional activity, and this one
>in particular gives it a decent workout.)
>
>For what it's worth, here is our connlimit setup anyway:
>
>------------------------------------------------------------
>The server has two rules that use connlimit:
>
>iptables -A <chain> -j REJECT -p tcp -s 0.0.0.0/0 -m connlimit --connlimit-above 3 --connlimit-mask 24
>iptables -A <chain> -j REJECT -p tcp -s 0.0.0.0/0 -m connlimit --connlimit-above 2 --connlimit-mask 32
>
>The other server has a few more such rules, but much less traffic that
>is likely to run afoul of them -- that would explain why it took much
>longer to crash.
>------------------------------------------------------------
>
>
>
>[8.] Environment
>
>[8.1.] Software (add the output of the ver_linux script here)
>
>GNU C               	8.2.0
>GNU Make            	4.2.1
>Binutils            	2.31.1
>Util-linux          	2.31.1
>Mount               	2.31.1
>Module-init-tools   	25
>Linux C Library     	2.28
>Dynamic linker (ldd)	2.28
>Sh-utils            	8.30
>
>
>[8.2.] Processor information (from /proc/cpuinfo):
>
>2-core VM, here's one:
>
>processor	: 0
>vendor_id	: GenuineIntel
>cpu family	: 6
>model		: 26
>model name	: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
>stepping	: 5
>microcode	: 0x19
>cpu MHz		: 2926.000
>cache size	: 8192 KB
>physical id	: 0
>siblings	: 2
>core id		: 0
>cpu cores	: 2
>apicid		: 0
>initial apicid	: 0
>fdiv_bug	: no
>f00f_bug	: no
>coma_bug	: no
>fpu		: yes
>fpu_exception	: yes
>cpuid level	: 11
>wp		: yes
>flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dtherm ida
>bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
>bogomips	: 5852.00
>clflush size	: 64
>cache_alignment	: 64
>address sizes	: 40 bits physical, 48 bits virtual
>power management:
>
>
>[8.3.] Module information (from /proc/modules):
>
>[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
>
>[8.5.] PCI information ('lspci -vvv' as root)
>
>[8.6.] SCSI information (from /proc/scsi/scsi)
>
>[8.7.] Other information that might be relevant to the problem
>       (please look in /proc and include all information that you
>       think to be relevant):
>
>
>Since this machine will crash so reliably (usually within 2-5 minutes
>of boot on an affected kernel) and since it's not user-visible, I can
>test easily.
>
>
>
>Todd
>-- 
>Todd Eigenschink                Ferguson Advertising
>todd@xxxxxxxx                   http://www.fai2.com/
>Non ex transverso sed deorsum   260-407-1584
>



[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux