On Tue, Nov 14, 2017 at 12:05:46PM +0100, Florian Westphal wrote: > Sebastian Gottschall <s.gottschall@xxxxxxxxxx> wrote: > > > > this should belong to you. with the revert the kernel crashes for me > > > > [ 24.838120] Unable to handle kernel NULL pointer dereference at virtual > > address 00000055 > > [ 24.846193] pgd = c0004000 > > [ 24.848893] [00000055] *pgd=00000000 > > [ 24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM > > [ 24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd > > ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk > > gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common > > ath9k_hw ath mac80211 cfg80211 compat > > [ 24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90 > > [ 24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree) > > [ 24.892921] task: ef029ac0 task.stack: ef05a000 > > [ 24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74 > > [ 24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74 > > [ 24.907869] pc : [<c04858c8>] lr : [<c04858b4>] psr: 60000153 > > [ 24.907869] sp : ef05bb58 ip : ef05bb58 fp : ef05bb6c > > [ 24.919317] r10: ed230cc0 r9 : ed230c00 r8 : edf45800 > > [ 24.924529] r7 : ebcd2f00 r6 : ec33739e r5 : c0892294 r4 : ebcd2f00 > > [ 24.931040] r3 : 00000000 r2 : 00000055 r1 : 00000000 r0 : c0892718 > > [ 24.937551] Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment > > user > > [ 24.944755] Control: 10c5387d Table: 2bd1006a DAC: 00000055 > > [ 24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210) > > [ 24.956477] Stack: (0xef05bb58 to 0xef05c000) > > Guillaume, can you check what I missed? > IIRC you said you had a revert for your 4.9 tree, correct? > Actually I had forgotten some details when I told you about it. What I did is simpler than a revert. I just cherry-picked the following commits: 124dffea9e8e ("netfilter: nat: use atomic bit op to clear the _SRC_NAT_DONE_BIT") 6e699867f84c ("netfilter: nat: avoid use of nf_conn_nat extension") e1bf1687740c ("netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"") Then I just had to fix one remaining call: @@ -842,7 +842,7 @@ static int __init nf_nat_init(void) return 0; cleanup_extend: - rhltable_destroy(&nf_nat_bysource_table); + nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size); nf_ct_extend_unregister(&nat_extend); return ret; } My original approach was to revert commit 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable") and fix the conflicts by hand. But I had crashes when testing the result. After a bit of debugging, I decided to simply try cherry-picking commit e1bf1687740c and its dependencies. Since I got better results with this approach, I dropped my original revert code (so, unfortunately, I can't compare with yours and I don't remember if the crash was similar to Sebastian's). We have the cherry-picked patches running on several machines since September. No problem found so far. Now, looking at Sebastian's report, I suspect that using nf_ct_ext_find(NF_CT_EXT_NAT) in nf_nat_cleanup_conntrack() might not be appropriate anymore. My approach uses the IPS_SRC_NAT_DONE bit instead, which is now set atomically. Theses changes weren't intentional, but introduced by the dependency on commits 124dffea9e8e and 6e699867f84c. Does that make sense to you? I can look more deeply into this tomorrow.