Re: [PATCH v6 net-next 1/3] net/ipv4: multipath routing: configurable seed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 29, 2021 at 06:29:20PM +0300, Ido Schimmel wrote:
> On Wed, Apr 28, 2021 at 03:31:33PM +0300, Pavel Balaev wrote:
> > Ability for a user to assign seed value to multipath route hashes.
> > Now kernel uses random seed value to prevent hash-flooding DoS attacks;
> > however, it disables some use cases, f.e:
> > 
> > +-------+        +------+        +--------+
> > |       |-eth0---| FW0  |---eth0-|        |
> > |       |        +------+        |        |
> > |  GW0  |ECMP                ECMP|  GW1   |
> > |       |        +------+        |        |
> > |       |-eth1---| FW1  |---eth1-|        |
> > +-------+        +------+        +--------+
> > 
> > In this use case, two ECMP routers balance traffic between two firewalls.
> > If some flow transmits a response over a different channel than request,
> > such flow will be dropped, because keep-state rules are created on
> > the other firewall.
> > 
> > This patch adds sysctl variable: net.ipv4.fib_multipath_hash_seed.
> > User can set the same seed value on GW0 and GW1 for traffic to be
> > mirror-balanced. By default, random value is used.
> > 
> > Signed-off-by: Pavel Balaev <balaevpa@xxxxxxxxxxx>
> > ---
> >  Documentation/networking/ip-sysctl.rst | 14 ++++
> >  include/net/flow_dissector.h           |  2 +
> >  include/net/netns/ipv4.h               |  2 +
> >  net/core/flow_dissector.c              |  7 ++
> >  net/ipv4/route.c                       | 10 ++-
> >  net/ipv4/sysctl_net_ipv4.c             | 97 ++++++++++++++++++++++++++
> >  6 files changed, 131 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index 9701906f6..d1a67e6fe 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -100,6 +100,20 @@ fib_multipath_hash_policy - INTEGER
> >  	- 1 - Layer 4
> >  	- 2 - Layer 3 or inner Layer 3 if present
> >  
> > +fib_multipath_hash_seed - STRING
> > +	Controls seed value for multipath route hashes. By default
> > +	random value is used. Only valid for kernels built with
> > +	CONFIG_IP_ROUTE_MULTIPATH enabled.
> > +
> > +	Valid format: two hex values set off with comma or "random"
> > +	keyword.
> > +
> > +	Example to generate the seed value::
> > +
> > +		RAND=$(openssl rand -hex 16) && echo "${RAND:0:16},${RAND:16:16}"
> > +
> > +	Default: "random"
> > +
> >  fib_sync_mem - UNSIGNED INTEGER
> >  	Amount of dirty memory from fib entries that can be backlogged before
> >  	synchronize_rcu is forced.
> > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > index ffd386ea0..d104c013a 100644
> > --- a/include/net/flow_dissector.h
> > +++ b/include/net/flow_dissector.h
> > @@ -348,6 +348,8 @@ static inline bool flow_keys_have_l4(const struct flow_keys *keys)
> >  }
> >  
> >  u32 flow_hash_from_keys(struct flow_keys *keys);
> > +u32 flow_multipath_hash_from_keys(struct flow_keys *keys,
> > +			   const siphash_key_t *seed);
> >  void skb_flow_get_icmp_tci(const struct sk_buff *skb,
> >  			   struct flow_dissector_key_icmp *key_icmp,
> >  			   const void *data, int thoff, int hlen);
> > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> > index 87e161249..cb2830432 100644
> > --- a/include/net/netns/ipv4.h
> > +++ b/include/net/netns/ipv4.h
> > @@ -222,6 +222,8 @@ struct netns_ipv4 {
> >  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> >  	u8 sysctl_fib_multipath_use_neigh;
> >  	u8 sysctl_fib_multipath_hash_policy;
> > +	int sysctl_fib_multipath_hash_seed;
> 
> Why 'int'?
>
will fix it, thanks
> > +	siphash_key_t __rcu *fib_multipath_hash_seed_ctx;
> >  #endif
> >  
> >  	struct fib_notifier_ops	*notifier_ops;
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 5985029e4..febd1094c 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -1560,6 +1560,13 @@ u32 flow_hash_from_keys(struct flow_keys *keys)
> >  }
> >  EXPORT_SYMBOL(flow_hash_from_keys);
> >  
> > +u32 flow_multipath_hash_from_keys(struct flow_keys *keys,
> > +				  const siphash_key_t *seed)
> > +{
> > +	return __flow_hash_from_keys(keys, seed);
> > +}
> > +EXPORT_SYMBOL(flow_multipath_hash_from_keys);
> > +
> >  static inline u32 ___skb_get_hash(const struct sk_buff *skb,
> >  				  struct flow_keys *keys,
> >  				  const siphash_key_t *keyval)
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index f6787c55f..79866b429 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1912,6 +1912,7 @@ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
> >  {
> >  	u32 multipath_hash = fl4 ? fl4->flowi4_multipath_hash : 0;
> >  	struct flow_keys hash_keys;
> > +	siphash_key_t *seed_ctx;
> >  	u32 mhash;
> >  
> >  	switch (net->ipv4.sysctl_fib_multipath_hash_policy) {
> > @@ -1989,7 +1990,14 @@ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
> >  		}
> >  		break;
> >  	}
> > -	mhash = flow_hash_from_keys(&hash_keys);
> > +
> > +	rcu_read_lock();
> > +	seed_ctx = rcu_dereference(net->ipv4.fib_multipath_hash_seed_ctx);
> > +	if (seed_ctx)
> > +		mhash = flow_multipath_hash_from_keys(&hash_keys, seed_ctx);
> > +	else
> > +		mhash = flow_hash_from_keys(&hash_keys);
> > +	rcu_read_unlock();
> 
> During netns initialization the per-netns seed can be initialized to a
> system global seed. When the sysctl is used this seed will be
> overridden. You can then remove this check and always call
> flow_multipath_hash_from_keys() with the per-netns seed.
> 
> I'm not suggesting to initialize the seed of each netns differently as
> some users might be inadvertently relying on the fact that it is
> currently the same for all namespaces.
>
Good idea, thanks, will fix it.
> >  
> >  	if (multipath_hash)
> >  		mhash = jhash_2words(mhash, multipath_hash, 0);
> > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > index a09e466ce..5dff59733 100644
> > --- a/net/ipv4/sysctl_net_ipv4.c
> > +++ b/net/ipv4/sysctl_net_ipv4.c
> > @@ -447,6 +447,8 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
> >  }
> >  
> >  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +#define FIB_MULTIPATH_SEED_KEY_LENGTH sizeof(siphash_key_t)
> > +#define FIB_MULTIPATH_SEED_RANDOM "random"
> >  static int proc_fib_multipath_hash_policy(struct ctl_table *table, int write,
> >  					  void *buffer, size_t *lenp,
> >  					  loff_t *ppos)
> > @@ -461,6 +463,93 @@ static int proc_fib_multipath_hash_policy(struct ctl_table *table, int write,
> >  
> >  	return ret;
> >  }
> > +
> > +static int proc_fib_multipath_hash_seed(struct ctl_table *table, int write,
> > +					void *buffer, size_t *lenp,
> > +					loff_t *ppos)
> > +{
> > +	struct net *net = container_of(table->data, struct net,
> > +	    ipv4.sysctl_fib_multipath_hash_seed);
> > +	/* maxlen to print the keys in hex (*2) and a comma in between keys. */
> > +	struct ctl_table tbl = {
> > +		.maxlen = ((FIB_MULTIPATH_SEED_KEY_LENGTH * 2) + 2)
> > +	};
> > +	siphash_key_t user_key, *ctx;
> > +	__le64 key[2];
> > +	int ret;
> > +
> > +	tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL);
> > +
> > +	if (!tbl.data)
> > +		return -ENOMEM;
> > +
> > +	rcu_read_lock();
> > +	ctx = rcu_dereference(net->ipv4.fib_multipath_hash_seed_ctx);
> > +	if (ctx) {
> > +		put_unaligned_le64(ctx->key[0], &key[0]);
> > +		put_unaligned_le64(ctx->key[1], &key[1]);
> > +		user_key.key[0] = le64_to_cpu(key[0]);
> > +		user_key.key[1] = le64_to_cpu(key[1]);
> > +
> > +		snprintf(tbl.data, tbl.maxlen, "%016llx,%016llx",
> > +			 user_key.key[0], user_key.key[1]);
> > +	} else {
> > +		snprintf(tbl.data, tbl.maxlen, "%s", FIB_MULTIPATH_SEED_RANDOM);
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
> > +
> > +	if (write && ret == 0) {
> 
> You can reduce nesting by using early return.

OK, got it!
> > +		siphash_key_t *new_ctx, *old_ctx;
> > +
> > +		if (!strcmp(tbl.data, FIB_MULTIPATH_SEED_RANDOM)) {
> > +			rtnl_lock();
> > +			old_ctx = rtnl_dereference(net->ipv4.fib_multipath_hash_seed_ctx);
> > +			RCU_INIT_POINTER(net->ipv4.fib_multipath_hash_seed_ctx, NULL);
> > +			rtnl_unlock();
> > +			if (old_ctx) {
> > +				synchronize_net();
> > +				kfree_sensitive(old_ctx);
> > +			}
> > +
> > +			pr_debug("multipath hash seed set to random value\n");
> > +			goto out;
> > +		}
> > +
> > +		if (sscanf(tbl.data, "%llx,%llx", user_key.key, user_key.key + 1) != 2) {
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		key[0] = cpu_to_le64(user_key.key[0]);
> > +		key[1] = cpu_to_le64(user_key.key[1]);
> > +		pr_debug("multipath hash seed set to 0x%llx,0x%llx\n",
> > +			 user_key.key[0], user_key.key[1]);
> 
> This leaks the seed... I understand how these prints can be useful
> during development, but I believe they should be removed prior to
> submission.

Thanks, forgot to remove this.
> > +
> > +		new_ctx = kmalloc(sizeof(*new_ctx), GFP_KERNEL);
> > +		if (!new_ctx) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +
> > +		new_ctx->key[0] = get_unaligned_le64(&key[0]);
> > +		new_ctx->key[1] = get_unaligned_le64(&key[1]);
> > +
> > +		rtnl_lock();
> > +		old_ctx = rtnl_dereference(net->ipv4.fib_multipath_hash_seed_ctx);
> > +		rcu_assign_pointer(net->ipv4.fib_multipath_hash_seed_ctx, new_ctx);
> > +		rtnl_unlock();
> > +		if (old_ctx) {
> > +			synchronize_net();
> > +			kfree_sensitive(old_ctx);
> > +		}
> > +	}
> 
> This looks overly complex to me and I believe a lot of users will ask
> themselves why they need to specify a seed using two hex numbers
> separated by a comma. Looking at other implementations that already
> allow specifying the seed, it is specified as a single integer.
> 
> 32-bit in Cumulus:
> https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-43/Layer-3/Routing/Equal-Cost-Multipath-Load-Sharing-Hardware-ECMP/#configure-a-hash-seed-to-avoid-hash-polarization
> 
> Up to 16-bit in Arista:
> https://eos.arista.com/hashing-for-l2-port-channels-and-l3-ecmp/
> 
> I believe you chose this interface because of the structure of the
> SipHash key that is used for the multipath hash calculation. This is an
> internal implementation detail and should not determine the user
> interface.
> 
> Looking at the history of the code, the flow dissector was migrated to
> SipHash in commit 55667441c84f ("net/flow_dissector: switch to
> siphash"). The motivating use case was flow label generation since these
> are sent on the wire together with the fields from which they were
> computed, not multipath hash calculation that also happens to rely on
> the flow dissector.
> 
> Given the above, do you see a problem with having the user specify a
> 32-bit number for the multipath hash seed? Note that SipHash is still
> used and that the number can be used to fill the entire 128-bit space.

Do you mean take 32-bit number from user and multiply it like this:
u32 key = val;
u64 key64;
memset(&key64, val, sizeof(u32));
memset(&key64 + sizeof(u32), val, sizeof(u32));
memset(seed.key[0], &key64, sizeof(u64));
memset(seed.key[1], &key64, sizeof(u64));
?
> The special value of "0" can be used to revert back to the random seed
> (needs to be documented, obviously).
> 
Got it!
> > +
> > +out:
> > +	kfree(tbl.data);
> > +	return ret;
> > +}
> >  #endif
> >  
> >  static struct ctl_table ipv4_table[] = {
> > @@ -1052,6 +1141,14 @@ static struct ctl_table ipv4_net_table[] = {
> >  		.extra1		= SYSCTL_ZERO,
> >  		.extra2		= &two,
> >  	},
> > +	{
> > +		.procname	= "fib_multipath_hash_seed",
> > +		.data		= &init_net.ipv4.sysctl_fib_multipath_hash_seed,
> > +		/* maxlen to print the keys in hex (*2) and a comma in between keys. */
> > +		.maxlen		= (FIB_MULTIPATH_SEED_KEY_LENGTH * 2) + 2,
> > +		.mode		= 0600,
> > +		.proc_handler	= proc_fib_multipath_hash_seed,
> > +	},
> >  #endif
> >  	{
> >  		.procname	= "ip_unprivileged_port_start",
> > -- 
> > 2.31.1
> > 



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux