Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > We cannot use percpu data for nf_bridge_info; at least its not trivial > > to do (I tried). > > > > Main issues are: > > - nfqueue (we have to save/restore info) > > - local delivery (skb is enqueued in backlog) > > - skb is dropped (we need to reset scratch data) > > - DNAT mac hack (skb can be queued in qdisc). The DNAT mac hack is no longer an issue, can be handled via skb->cb. > What about something like this? > > 1) We keep a nf_bridge table that contains the nf_bridge structures > that are currently in used, so you can look it up for them every > time we need it. This can be implemented as a per-cpu list of > nf_bridge structures. Not sure if it can be percpu. In particular, when we pass frame up skb can be enqueud in backlog, also we might encounter ingress scheduler so I am not sure that skb cannot move to another cpu. > 2) We have another per-cpu cache to hold a pointer to the current > nf_bridge. > > 3) We only need one bit from sk_buff to indicate that this sk_buff has > nf_bridge info attached. Yes, the "put it in skb->cb" also uses this approach, it puts 2bit "state" information into skb and then removes the pointer. > Then, we can use the sk_buff pointer as an unique way to identify > the owner of the nf_bridge structure. > > struct nf_bridge { > struct list_head head; > const struct sk_buff *owner; > ... > } > > so if we need the scratchpad area, we first have to look up for it: > > struct nf_bridge *nf_bridge_find(struct sk_buff *skb) > { > ... > > /* First, check if we have it in the per_cpu cache. */ > nf_bridge = per_cpu_ptr(nf_bridge_pcpu); > if (nf_bridge->owner == skb) > return nf_bridge; > > /* Otherwise, slow path: find this scratchpad area in the list. */ > ... > nf_bridge_list = per_cpu_ptr(nf_bridge_pcpu_list); > list_for_each_entry(n, &nf_bridge_list, head) { > if (nf_bridge->owner == skb) { > /* Update the per_cpu cache. */ > rcu_assign_pointer(nf_bridge, n); > return nf_bridge; > } > } > > return NULL; > } Ok, I see. This would work, but I'd prefer to just use the skb control buffer. > I'm proposing this because I have concerns with the approach to place > nf_bridge in skb->cb. This br_netfilter thing makes the skb go back > and forth from different layers while expecting to find the right > right data in it, this seems fragile to me. Mhhh, for the *usual* case of "skb is forwarded by bridge" skb->cb should be safe since we only move skb between bridge and inet(6). The only "problematic" case is where skb is DNAT'd, then things get hairy (e.e.g dnat-to-host-on-other device). > What do you think? I think that currently it doesn't matter what solution we'll pick in the end, I'd first like to send all my refactoring patches. With those patches, it reduces the nf_bridge_info content that we need to the physin and physout devices, plus a two-bit state in skb. Then, if you still think that ->cb is too fragile I'd be happy to add the lookup table method. For the normal case of bridge forwarding, it shouldn't be needed though, perhaps we could also use a hybrid approach, cb-based fastpath for forwarding, lookup-table slowpath for dnat and local delivery cases. One other solution would be to use skb->cb but not store bridge device pointers but the ifindexes instead (we currently don't hold any device refcounts either so the current method isn't 100% safe either since such device can be removed ...). -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html