On Wed, Nov 03, 2021 at 02:13:58AM IST, Florian Westphal wrote: > Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > > I tried to find a use case but I could not. > > > Entry will time out soon once packets stop appearing, so it can't be > > > used for stack bypass. Is it for something else? If so, what? > > > > I think Maxim's use case was to implement a SYN proxy in XDP, where the > > XDP program just needs to answer the question "do I have state for this > > flow already". For TCP flows terminating on the local box this can be > > done via a socket lookup, but for a middlebox, a conntrack lookup is > > useful. Maxim, please correct me if I got your use case wrong. > > Looked at > https://netdevconf.info/0x15/slides/30/Netdev%200x15%20Accelerating%20synproxy%20with%20XDP.pdf > > seems thats right, its only a "does it exist". > FYI, there's also an example in the original series (grep for bpf_ct_lookup_tcp): https://lore.kernel.org/bpf/20211019144655.3483197-11-maximmi@xxxxxxxxxx > > > For UDP it will work to let a packet pass through classic forward > > > path once in a while, but this will not work for tcp, depending > > > on conntrack settings (lose mode, liberal pickup etc. pp). > > > > The idea is certainly to follow up with some kind of 'update' helper. At > > a minimum a "keep this entry alive" update, but potentially more > > complicated stuff as well. Details TBD, input welcome :) > > Depends on use case. For bypass infra I'd target the flowtable > infra rather than conntrack because it gets rid of the "early time out" > problem, plus you get the output interface/dst entry. > > Not trivial for xdp because existing code assumes sk_buff. > But I think it can be refactored to allow raw buffers, similar > to flow dissector. > > > >> + hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple); > > > > > > Ok, so default zone. Depending on meaning of "unstable helper" this > > > is ok and can be changed in incompatible way later. > > > > I'm not sure about the meaning of "unstable" either, TBH, but in either > > case I'd rather avoid changing things if we don't have to, so I think > > adding the zone as an argument from the get-go may be better... > > Another thing I just noted: > The above gives a nf_conn with incremented reference count. > > For Maxims use case, thats unnecessary overhead. Existence can be > determined without reference increment. The caveat is that the pointer > cannot be used after last rcu_read_unlock(). >From my reading, it was safe but not correct to use (as in dereference) without using nf_conntrack_find_get, since even though freeing of underlying memory is done using SLAB_DESTROY_BY_RCU, but the nf_conn itself may not correspond to the same tuple in the rcu read section without taking a reference. So doing what the example currently does (checking ct->status & IPS_CONFIRMED_BIT) is not safe without raising the reference, even though the XDP program invocation is under RCU protection. Returning a PTR_TO_BTF_ID for the nf_conn wouldn't really work without getting a reference on it, since the object can be recycled. -- Kartikeya