Re: [PATCH nf] netfilter: nf_tables: rework ct timeout set support

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Thu, 23 Aug 2018 15:14:16 +0200

On Thu, Aug 23, 2018 at 11:58:34AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > > This patch reworks template policy to instead work with existing conntrack.
> > > 
> > > As long as such conntrack has not yet been placed into the hash table
> > > (unconfirmed) we can still add the timeout extension.
> > > 
> > > The only caveat is that we now need to update/correct ct->timeout to
> > > reflect the initial/new state, otherwise the conntrack entry retains the
> > > default 'new' timeout.
> > > 
> > > Side effect of this change is that setting the policy must
> > > now occur from chains that are evaluated *after* the conntrack lookup
> > > has taken place.
> > > 
> > > No released kernel contains the timeout policy feature yet, so this change
> > > should be ok.
> > > 
> > > Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
> > > ---
> > >  nft_ct_timeout_update() is ugly, but i found no better
> > >  solution.
> > 
> > See below.
> >
> > >  The alternative is to allow use of templates,
> > >  but that requires nasty kmemdup() games to clone the
> > >  template, else we'd modify some percpu/readonly template.
> > 
> > percpu template would allow us to combine both, I mean, to use the
> > template as a scratchpad area. The template is only used from the same
> > hook point to pass information between hook callbacks.
> 
> I found no way to do this.
> 
> Consider this:
> 
> tcp dport 22 ct timeout set "baz" accept
> meta iifname ct zone set 23 ct timeout set "foo" accept
> ct zone set 42 ct timeout set "bar" accept
> 
> (ruleset is not very meaningful, only to illustrate problem)
> 
> If a pcpu template is used, we must reset that pcpu scatchpad area
> back to a pristine state whenever a new skb comes in.
> 
> Otherwise, if first rule does match, pcpu template might still contain
> 'zone 23' or 'zone 42'.

Hm, indeed. We can probably set an internal dirty flag in the
tmpl->status bits from nf_conntrack_in() so nft_ct knows the percpu
tmpl needs a cleanup (and such cleanup would be done from
nft_ct/xt_CT). We could preallocate helper and timeout, so they are
just NULL in case now helper/timeout is attached.

> And we don't know when the 'next' skb comes along.
> nfqueue complicates this further (see the ugly refcnt check in the
> nft_ct.c parts that deal with 'zone set' support).

>From nfqueue, the template is tricky indeed.

> > Probably we need to support both approaches? I mean, the legacy one
> > that allows users to set the timeout from the raw table and the new
> > one which doesn't need template at all. Same thing with ct helpers in
> > nftables, which don't templates either.
> 
> The only "solution" that I see to support all cases for 'timeout'
> is this:
> 
> if skb->nfct is a template, do a 'full' clone of the template
> (need new helper function for this to deal with nf_conn->ext copy),
> then mangle the copy, discard old template
> 
> if skb->nfct is NULL, allocate a new template from the packet path
> and initialise it.

Hm, could we only follow this approach your describe for the nfqueue
reinject case? I mean, clone the template as you describe only from
the nfqueue escape path. Then, use the percpu approach I described
above for the usual/no nfqueue path.

Otherwise, we could store the zone u16 value from the nfqueue escape
path to simplify things? For ct helper, we would need to store the
helper name from the escape path and do a lookup to reset it from the
nf_reinject(). Hm, for cttimeout, it's get more tricky... as we need a
way to reattach the timeout for the template.

Isn't the template idea itself is very much broken even for xt_CT with
nfqueue?

> > > +	case IPPROTO_TCP:
> > > +		ct->timeout = values[1];
> > > +		break;
> > > +	default:
> > > +		ct->timeout = values[0];
> > 
> > I think we can avoid the assymetry by making UDP and generic trackers
> > point to 1 for the default timeout, so we don't need this. It will be
> > just 4 bytes more per the global timeout array and we don't need this
> > function, it would need a preparation patch for this.
> 
> We could also make TCP/SCTP/DCCP use 0, as that is unused.
> I can make this change.

Yes, that's also fine, note that for TCP we expose through UAPI the
array order I think, so we need to deal with using 0 from the sysctl
and the ctnetlink path.

> > We can probably use nf_ct_refresh() instead here?
> 
> Oh, right, we can do that in fact.

OK.

> > > -	if (ct ||
> > > -	    priv->l4proto != pkt->tprot)
> > > +	if (priv->l4proto != pkt->tprot)
> > >  		return;
> > >  
> > > -	nf_ct_set(skb, priv->tmpl, IP_CT_NEW);
> > > +	if (!ct || nf_ct_is_template(ct))
> > > +		return;
> > > +
> > > +	timeout = nf_ct_timeout_find(ct);
> > > +	if (timeout) {
> > 
> > I think we should turn this into noop for unconfirmed.
> 
> OK, that makes it slightly simpler indeed.

OK.