On Tue, Nov 27, 2018 at 09:23:49AM +0100, Pablo Neira Ayuso wrote: > On Tue, Nov 27, 2018 at 03:20:45AM +0100, Christian Brauner wrote: > > On Tue, Nov 27, 2018 at 01:20:47AM +0100, Pablo Neira Ayuso wrote: > > > Hi, > > > > > > On Wed, Nov 07, 2018 at 02:48:58PM +0100, Christian Brauner wrote: > > > [...] > > > > diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h > > > > index ca043342c0eb..eedbd1ac940e 100644 > > > > --- a/include/net/netns/netfilter.h > > > > +++ b/include/net/netns/netfilter.h > > > > @@ -35,4 +35,20 @@ struct netns_nf { > > > > bool defrag_ipv6; > > > > #endif > > > > }; > > > > + > > > > +struct netns_brnf { > > > > +#ifdef CONFIG_SYSCTL > > > > + struct ctl_table_header *ctl_hdr; > > > > +#endif > > > > + > > > > + /* default value is 1 */ > > > > + int call_iptables; > > > > + int call_ip6tables; > > > > + int call_arptables; > > > > + > > > > + /* default value is 0 */ > > > > + int filter_vlan_tagged; > > > > + int filter_pppoe_tagged; > > > > + int pass_vlan_indev; > > > > +}; > > > > > > I have spun on this several times, wondering if there's a way to avoid > > > scratching these many bytes per netns to expose these sysctl entries > > > that are plain on/off toggles... You said this: > > > > > > >Currently, the /proc/sys/net/bridge folder is only created in the > > > >initial network namespace > > > > > > I think we can add one single sysctl to expose these as flags from net > > > namespaces. Idea is to keep the existing (legacy) sysctl entries for > > > init_net only, and add a new single new one that exposes these as flags > > > (should be also available for consistency in init_net I'd suggest). > > > Flags could be map in this way, eg. > > > > > > 0x1 call_iptables > > > 0x2 call_ip6tables > > > 0x4 call_arptables > > > 0x8 filter_vlan_tagged > > > ... > > > > > > Also documentation would be good to have for this. > > > > > > Would this idea fly for you? Thanks. > > > > My suggestion is to keep these files per network namespace but have a > > single flag argument in struct netns_brnf: > > +struct netns_brnf { > > +#ifdef CONFIG_SYSCTL > > + struct ctl_table_header *ctl_hdr; > > +#endif > > + > > + /* default value is 1 */ > > + unsigned int filter_flags; > > +}; > > > > #define BRNF_CALL_IPTABLES 0x1 > > #define BRNF_CALL_IP6TABLES 0x2 > > #define BRNF_CALL_ARPTABLES 0x4 > > #define BRNF_CALL_VLAN_TAGGED 0x8 > > > > a write to the corresponding file would then cause the flag to be set or > > unset in filter_flags. > > This way we are a) space-efficient internally not bloating struct net > > while b) not breaking running tools in non-initial network namespaces > > that expect the files to be there. b) is really the important bit here. :) > > OK, please, go explore this space-efficient approach. Thanks. Sorry for the wait. Other patches came up. :) So, I looked into this approach and it is annoying to do: - the sysctl proc parsing infrastructure is not equipped to deal with flags at all and expanding it to it would be a lot of code - we would need either an atomic type or locking for filter_flags in the netns_brnf struct if multiple proc sysctl handlers try to raise or lower bits in filter_flags via different files at the same time So I feel that this is not a feasible solution. We could make netns_brnf a pointer in struct net and allocate it on new network namespace creation if we care about space but then we take the performance hit of k*alloc(). What I stressed before: for userspace it's important that we don't change the semantics how br netfilter is configured in a non-initial network namespace to not break existing tools in such environments. Christian