This morning I found this thread, which I didn't see last night. I'm not sure how I missed it, since I knew what I was searching for. It includes a link to the same patches as I mentioned, but with a status filter in the URL such that I can see the patches. I applied the three patches and tested and it does NOT fix the problem for me. It changes the behavior somewhat -- I saw several oopses (or other noise) scroll past before it locked up. It also ended with something like "eth0: pcnet32 transmit timed out", which I hadn't seen before. https://www.spinics.net/lists/netfilter-devel/msg57045.html https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=* Todd Eigenschink writes: >EPILOGUE-AS-PREAMBLE: > >I had already typed most of this when I thought to search the >netfilter-devel archive. I found this, which sounds an awful lot like >my issue: > >https://www.spinics.net/lists/netfilter-devel/msg56882.html > >However, the patch link in the first followup seems empty, so I can't >verify that it's the same thing or that the proposed fix works for me. > > >---------------------------------------------------------------------- > >[1.] One line summary of the problem: > >4.19.x kernels oops in nf_conncount_destroy. > > >[2.] Full description of the problem/report: > >We have been running 4.18.x kernels, up through 4.18.20, in production >for a small web/email hosting operation with no issues. Everything >relevant here is 32-bit Linux on VMware ESXi. Upon the release of >4.18.20 and knowing that it was EOL, I stepped to then-current 4.19.4. > >One of our machines (a mail gateway) hung with an oops within a minute >or two of boot. I rolled it back to deal with later. > >The next morning, another machine (coincidentally another mail >gateway) crashed as well, and the tail end of the oops--that I could >see on the 80x25 console--looked similar to what I remembered from the >first. I rolled it back. If a third one happened, I was going to roll >them all back. No other machines had issues. > >When 4.19.5 was released, I tried that, with the same effect, so I >decided that since the fastest-crashing machine was, while production, >not going to cause user-visible issues, I'd bisect to try to hunt down >the cause. Every other machine, about 30 total, has been fine on >4.19.4 / 4.19.5. > >Bisecting led me to this. > > >5c789e131cbb997a528451564ea4613e812fc718 is the first bad commit >commit 5c789e131cbb997a528451564ea4613e812fc718 >Author: Yi-Hung Wei <yihung.wei@xxxxxxxxx> >Date: Mon Jul 2 17:33:44 2018 -0700 > > netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search > > This patch is originally from Florian Westphal. > > This patch does the following 3 main tasks. > > 1) Add list lock to 'struct nf_conncount_list' so that we can > alter the lists containing the individual connections without holding the > main tree lock. It would be useful when we only need to add/remove to/from > a list without allocate/remove a node in the tree. With this change, we > update nft_connlimit accordingly since we longer need to maintain > a list lock in nft_connlimit now. > > 2) Use RCU for the initial tree search to improve tree look up performance. > > 3) Add a garbage collection worker. This worker is schedule when there > are excessive tree node that needed to be recycled. > > Moreover,the rbnode reclaim logic is moved from search tree to insert tree > to avoid race condition. > > Signed-off-by: Yi-Hung Wei <yihung.wei@xxxxxxxxx> > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > >:040000 040000 3117a9e5f5d91c55bfcb495ed0cf20aac47beb4c eb16c3c84edfa70268c651490dd5031a6474ca2d M include >:040000 040000 f69622ea9603500bc837f6348bc7ffe6e4edefda 8983dc24192abb1ae1925f023a495c39d171021c M net > > >And it makes perfect sense: Our only two machines that use >nf_connlimit in their firewall configs are those two mail gateways. I >imagine that the speed at which they oops has to do with their >specific connlimit settings and how quickly they accumulate enough >traffic to hit one of them. > >Oops details are below. > > >[3.] Keywords (i.e., modules, networking, kernel): > >netfilter, nf_conncount, nf_connlimit > > >[4.] Kernel information > >[4.1.] Kernel version (from /proc/version): > >[4.2.] Kernel .config file: > >grep = .config, net-related stuff only: > > >CONFIG_NET=y >CONFIG_NET_INGRESS=y >CONFIG_PACKET=y >CONFIG_UNIX=y >CONFIG_XFRM=y >CONFIG_XFRM_ALGO=y >CONFIG_XFRM_USER=y >CONFIG_XFRM_SUB_POLICY=y >CONFIG_XFRM_IPCOMP=m >CONFIG_NET_KEY=m >CONFIG_INET=y >CONFIG_IP_MULTICAST=y >CONFIG_IP_ADVANCED_ROUTER=y >CONFIG_IP_MULTIPLE_TABLES=y >CONFIG_INET_AH=m >CONFIG_INET_ESP=m >CONFIG_INET_IPCOMP=m >CONFIG_INET_XFRM_TUNNEL=m >CONFIG_INET_TUNNEL=m >CONFIG_INET_XFRM_MODE_TRANSPORT=m >CONFIG_INET_XFRM_MODE_TUNNEL=m >CONFIG_INET_XFRM_MODE_BEET=m >CONFIG_TCP_CONG_CUBIC=y >CONFIG_DEFAULT_TCP_CONG="cubic" >CONFIG_NET_PTP_CLASSIFY=y >CONFIG_NETFILTER=y >CONFIG_NETFILTER_ADVANCED=y >CONFIG_NETFILTER_INGRESS=y >CONFIG_NETFILTER_NETLINK=y >CONFIG_NETFILTER_FAMILY_ARP=y >CONFIG_NF_CONNTRACK=y >CONFIG_NF_LOG_COMMON=y >CONFIG_NETFILTER_CONNCOUNT=y >CONFIG_NF_CONNTRACK_MARK=y >CONFIG_NF_CONNTRACK_PROCFS=y >CONFIG_NF_CONNTRACK_TIMEOUT=y >CONFIG_NF_CONNTRACK_FTP=y >CONFIG_NF_CT_NETLINK=y >CONFIG_NF_CT_NETLINK_TIMEOUT=y >CONFIG_NF_NAT=y >CONFIG_NF_NAT_NEEDED=y >CONFIG_NF_NAT_FTP=y >CONFIG_NF_NAT_REDIRECT=y >CONFIG_NF_TABLES=y >CONFIG_NFT_CT=y >CONFIG_NFT_CONNLIMIT=y >CONFIG_NFT_LOG=y >CONFIG_NFT_LIMIT=y >CONFIG_NFT_MASQ=y >CONFIG_NFT_NAT=y >CONFIG_NFT_REJECT=y >CONFIG_NF_FLOW_TABLE=m >CONFIG_NETFILTER_XTABLES=y >CONFIG_NETFILTER_XT_MARK=y >CONFIG_NETFILTER_XT_CONNMARK=y >CONFIG_NETFILTER_XT_TARGET_CONNMARK=y >CONFIG_NETFILTER_XT_TARGET_LOG=y >CONFIG_NETFILTER_XT_TARGET_MARK=y >CONFIG_NETFILTER_XT_NAT=y >CONFIG_NETFILTER_XT_TARGET_NETMAP=y >CONFIG_NETFILTER_XT_TARGET_REDIRECT=y >CONFIG_NETFILTER_XT_TARGET_TPROXY=m >CONFIG_NETFILTER_XT_MATCH_COMMENT=y >CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y >CONFIG_NETFILTER_XT_MATCH_CONNMARK=y >CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y >CONFIG_NETFILTER_XT_MATCH_ESP=m >CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m >CONFIG_NETFILTER_XT_MATCH_HELPER=y >CONFIG_NETFILTER_XT_MATCH_IPRANGE=m >CONFIG_NETFILTER_XT_MATCH_LENGTH=y >CONFIG_NETFILTER_XT_MATCH_LIMIT=y >CONFIG_NETFILTER_XT_MATCH_MARK=y >CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m >CONFIG_NETFILTER_XT_MATCH_POLICY=y >CONFIG_NETFILTER_XT_MATCH_STATE=y >CONFIG_NETFILTER_XT_MATCH_STATISTIC=m >CONFIG_NETFILTER_XT_MATCH_STRING=m >CONFIG_NETFILTER_XT_MATCH_TCPMSS=m >CONFIG_NF_DEFRAG_IPV4=y >CONFIG_NF_CONNTRACK_IPV4=y >CONFIG_NF_TPROXY_IPV4=m >CONFIG_NF_TABLES_IPV4=y >CONFIG_NFT_CHAIN_ROUTE_IPV4=y >CONFIG_NFT_REJECT_IPV4=y >CONFIG_NF_TABLES_ARP=y >CONFIG_NF_FLOW_TABLE_IPV4=m >CONFIG_NF_LOG_IPV4=y >CONFIG_NF_REJECT_IPV4=y >CONFIG_NF_NAT_IPV4=y >CONFIG_NFT_CHAIN_NAT_IPV4=y >CONFIG_NF_NAT_MASQUERADE_IPV4=y >CONFIG_NFT_MASQ_IPV4=y >CONFIG_IP_NF_IPTABLES=y >CONFIG_IP_NF_FILTER=y >CONFIG_IP_NF_TARGET_REJECT=y >CONFIG_IP_NF_NAT=y >CONFIG_IP_NF_TARGET_MASQUERADE=y >CONFIG_IP_NF_TARGET_NETMAP=y >CONFIG_IP_NF_TARGET_REDIRECT=y >CONFIG_IP_NF_MANGLE=y > > >[5.] Most recent kernel version which did not have the bug: > >4.18.x is fine. 4.19+ all have it. > > >[6.] Output of Oops.. message (if applicable) with symbolic information > resolved (see Documentation/admin-guide/bug-hunting.rst) > >For most oopses, all I have is the tail 80x25 of the output since I >can't scroll the console back. A lot of them had call traces that >included bits like: > >EIP: native_safe_halt+0x5/0x7 >[...] > ? siphash_3u64+[...] > default_idle+[...] > arch_cpu_idle+[...] > [...] > >as well as some IRQ stuff, which really made no sense to me. Then, one >or two bisect steps from the end, I had one that didn't lock up the >machine, so I could scroll back: > >BUG: unable to handle kernel NULL pointer dereference at 00000000 >*pdpt = 00000000712fd001 *pde = 0000000000000000 >Oops: 0000 [#1] SMP >CPU: 1 PID 26422 Comm: iptables Not tainted 4.18.0-rc3-00851-ged07d9a021df #22 >Hardware name: VMware, Inc. VMware Virtual Platform/400BX Desktop Reference Platform, BIOS 6.00 09/30/2014 >EIP: nf_conncount_destroy+0x4d/0xa5 >Code: ed 4c ff ff 89 f8 83 c7 04 05 04 04 00 00 89 45 ec 89 f8 [...] >[...] >Call Trace: > connlimit_mt_destroy+0x14/0x16 > cleanup_match+0x34/0x52 > cleanup_entry+0x2e/0x8b > do_ipt_set_ctl+0x412/0x48e > ? do_ipt_get_ctl+0x39e/0x39e > nf_setsockopt+0x37/0x57 > ip_setsockopt+0x4b/0x5a > [and so on back to entry_SYSENTER_32] > > >Complete screen shots are available if they'll be of any use. > > > >[7.] A small shell script or example program which triggers the > problem (if possible) > >When I saw "conncount", and knowing that it was our two mail gateways, >my thoughts (above) jumped to our connlimit settings. > >HOWEVER. The oops says it was triggered by iptables. Both machines >also use sshguard, which uses iptables to add DROP rules to a chain. >(Nearly all our machines use sshguard, but the two mail gateways are >the only two that give it more than occasional activity, and this one >in particular gives it a decent workout.) > >For what it's worth, here is our connlimit setup anyway: > >------------------------------------------------------------ >The server has two rules that use connlimit: > >iptables -A <chain> -j REJECT -p tcp -s 0.0.0.0/0 -m connlimit --connlimit-above 3 --connlimit-mask 24 >iptables -A <chain> -j REJECT -p tcp -s 0.0.0.0/0 -m connlimit --connlimit-above 2 --connlimit-mask 32 > >The other server has a few more such rules, but much less traffic that >is likely to run afoul of them -- that would explain why it took much >longer to crash. >------------------------------------------------------------ > > > >[8.] Environment > >[8.1.] Software (add the output of the ver_linux script here) > >GNU C 8.2.0 >GNU Make 4.2.1 >Binutils 2.31.1 >Util-linux 2.31.1 >Mount 2.31.1 >Module-init-tools 25 >Linux C Library 2.28 >Dynamic linker (ldd) 2.28 >Sh-utils 8.30 > > >[8.2.] Processor information (from /proc/cpuinfo): > >2-core VM, here's one: > >processor : 0 >vendor_id : GenuineIntel >cpu family : 6 >model : 26 >model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz >stepping : 5 >microcode : 0x19 >cpu MHz : 2926.000 >cache size : 8192 KB >physical id : 0 >siblings : 2 >core id : 0 >cpu cores : 2 >apicid : 0 >initial apicid : 0 >fdiv_bug : no >f00f_bug : no >coma_bug : no >fpu : yes >fpu_exception : yes >cpuid level : 11 >wp : yes >flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dtherm ida >bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass >bogomips : 5852.00 >clflush size : 64 >cache_alignment : 64 >address sizes : 40 bits physical, 48 bits virtual >power management: > > >[8.3.] Module information (from /proc/modules): > >[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) > >[8.5.] PCI information ('lspci -vvv' as root) > >[8.6.] SCSI information (from /proc/scsi/scsi) > >[8.7.] Other information that might be relevant to the problem > (please look in /proc and include all information that you > think to be relevant): > > >Since this machine will crash so reliably (usually within 2-5 minutes >of boot on an affected kernel) and since it's not user-visible, I can >test easily. > > > >Todd >-- >Todd Eigenschink Ferguson Advertising >todd@xxxxxxxx http://www.fai2.com/ >Non ex transverso sed deorsum 260-407-1584 >