On Thu, Apr 13, 2023 at 11:04:24AM +0200, Pablo Neira Ayuso wrote: > On Thu, Apr 13, 2023 at 11:23:11AM +0800, Tzung-Bi Shih wrote: > > nf_ct_expires() got called by userspace program. And the return > > value (which means the remaining timeout) will be the parameter for > > the next ctnetlink_change_timeout(). > > Unconfirmed conntrack is owned by the packet that refers to it, it is > not yet in the hashes. I don't see how concurrent access to the > timeout might occur. > > Or are you referring to a different scenario that triggers the partial > state? I think it isn't a concurrent access. We observed that userspace program reads the remaining timeout and sets it back for unconfirmed conntrack. Conceptually, it looks like: timeout = nf_ct_expires(...); (1) ctnetlink_change_timeout(...timeout); At (1), `nfct_time_stamp` is wrongly involved in the calculation[4] because the conntrack is unconfirmed. The `ct->timeout` is an internal but not a timestamp. As a result, ctnetlink_change_timeout() sets the wrong value to `ct->timeout`. [4]: https://elixir.bootlin.com/linux/v6.3-rc6/source/include/net/netfilter/nf_conntrack.h#L296 > > As you can see in [4], if this happens on an unconfirmed conntrack, the > > `nfct_time_stamp` would be wrongly invoved in the calculation again. > > That's why we take care of all `ct->timeout` accesses in v1. Again, that's why v1 separates all `ct->timeout` accesses. If the conntrack is confirmed: - `ct->timeout` is a timestamp. - `nfct_time_stamp` should be involved when getting/setting `ct->timeout`. If the conntrack is unconfirmed: - `ct->timeout` is an interval. - `nfct_time_stamp` shouldn't be involved. Only separate `ct->timeout` access in __nf_ct_set_timeout() is obviously insufficient. I would suggest either take care of all `ct->timeout` accesses as v1 or at least change both __nf_ct_set_timeout() and nf_ct_expires(). In the latter case, it looks fine in our environment. However, I'm not sure if any hidden cases we haven't seen.