2017-06-26 15:16 GMT+02:00 Florian Westphal <fw@xxxxxxxxx>: > Bjørnar Ness <bjornar.ness@xxxxxxxxx> wrote: >> When trying to narrow down the problem, I removed the NAT rules, and >> in particular >> the >> >> chain postrouting { >> type nat hook postrouting priority 100 >> } >> >> And the problem disappears. Commenting in the above block again, >> causes the following to happen: >> >> kworker/0:0 starts to use more and more cpu, and in less than a minute >> renders the >> machine useless. If network cable is unplugged, it takes aroung 30 >> seconds for the machine to get into a useful state again. > > The kworker is most likely the conntrack gc worker, but the gc worker is nat > agnostic, so I don't see how this makes a difference wrt. nat > postrouting hook presence. It might ofcorse be just the straw that brpke the camel's back, but this behavior is reproducible, and is present in 4.9.0-rc6, 4.11.0 and 4.12.0-rc6. We do not have problems with 4.8.6 (but as mentioned, it has other conntrack problems reported earlier) > perf top might help pinpoint the source. > > What kernel is this, exactly? > > 4.10 (and 4.9.14 and later) has a change to make gc worker use less > cycles. We see the behavior in 4.11.0 and 4.12.0-rc6 as well > But I don't see the NAT connection. Not sure what you mean here. We do not eed to have rules in the postrouting chain for it to tear down the server. Monitoring conntrack entries also shows me this is stable around 120k -- Bj(/)rnar -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html