Re: crash on >= 4.9.0 kernel seems nf related

Bjørnar Ness <bjornar.ness@xxxxxxxxx> · Tue, 27 Jun 2017 13:23:09 +0200

2017-06-26 15:16 GMT+02:00 Florian Westphal <fw@xxxxxxxxx>:
> Bjørnar Ness <bjornar.ness@xxxxxxxxx> wrote:
>> When trying to narrow down the problem, I removed the NAT rules, and
>> in particular
>> the
>>
>> chain postrouting {
>>   type nat hook postrouting priority 100
>> }
>>
>> And the problem disappears. Commenting in the above block again,
>> causes the following to happen:
>>
>> kworker/0:0 starts to use more and more cpu, and in less than a minute
>> renders the
>> machine useless. If network cable is unplugged, it takes aroung 30
>> seconds for the machine to get into a useful state again.
>
> The kworker is most likely the conntrack gc worker, but the gc worker is nat
> agnostic, so I don't see how this makes a difference wrt. nat
> postrouting hook presence.

It might ofcorse be just the straw that brpke the camel's back, but
this behavior
is reproducible, and is present in 4.9.0-rc6, 4.11.0 and 4.12.0-rc6.
We do not have
problems with 4.8.6 (but as mentioned, it has other conntrack problems reported
earlier)

> perf top might help pinpoint the source.
>
> What kernel is this, exactly?
>
> 4.10 (and 4.9.14 and later) has a change to make gc worker use less
> cycles.

We see the behavior in 4.11.0 and 4.12.0-rc6 as well

> But I don't see the NAT connection.

Not sure what you mean here. We do not eed to have rules in the
postrouting chain for it
to tear down the server. Monitoring conntrack entries also shows me
this is stable around 120k

-- 
Bj(/)rnar
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html