Re: [PATCH] ipvs: drop templates for never established TCP connections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	Hello,

On Mon, 21 May 2018, Michal Koutný wrote:

> IPVS includes protection against filling the ip_vs_conn_tab by dropping 1/32 of
> feasible entries every second. The template entries (for persistent services)
> are never directly deleted by this mechanism but when a picked TCP connection
> entry is being dropped (1), the respective template entry is dropped too
> (realized by expiring 60 seconds after the connection entry being dropped).

	We try to drop the template in ip_vs_random_dropentry()
but I guess kernel/time/timer.c:enqueue_timer() puts both timers
in reverse order for expiration by using hlist_add_head().

> There is another mechanism that removes connection entries when they
> time out (2), in this case the associated template entry is not deleted.
> Under SYN flood template entries would accumulate (due to their entry
> longer timeout).

	There is also ip_vs_todrop() called in tcp_conn_schedule().
It just drops specific part from the SYNs on memory pressure.

> The accumulation takes place also with drop_entry being enabled. Roughly
> 15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
> (1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
> for SYN_RECV), thus template entries would still accumulate.
> 
> The patch ensures that when a connection entry times out, we also remove the
> template entry from the table. To prevent breaking persistent services (since
> the connection may time out in already established state) we add a new entry
> flag to protect templates what spawned at least one established TCP connection.
> 
> Cc: Michal Kubeček <mkubecek@xxxxxxxx>
> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx>
> ---
>  include/uapi/linux/ip_vs.h           | 33 +++++++++++++++++----------------
>  net/netfilter/ipvs/ip_vs_conn.c      | 10 +++++++++-
>  net/netfilter/ipvs/ip_vs_core.c      | 15 ++++++++++++++-
>  net/netfilter/ipvs/ip_vs_proto_tcp.c |  6 ++++++
>  4 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> index 1c916b2f89dc..ef3bbc001fcd 100644
> --- a/include/uapi/linux/ip_vs.h
> +++ b/include/uapi/linux/ip_vs.h
> @@ -79,22 +79,23 @@
>   *      IPVS Connection Flags
>   *      Only flags 0..15 are sent to backup server
>   */
> -#define IP_VS_CONN_F_FWD_MASK	0x0007		/* mask for the fwd methods */
> -#define IP_VS_CONN_F_MASQ	0x0000		/* masquerading/NAT */
> -#define IP_VS_CONN_F_LOCALNODE	0x0001		/* local node */
> -#define IP_VS_CONN_F_TUNNEL	0x0002		/* tunneling */
> -#define IP_VS_CONN_F_DROUTE	0x0003		/* direct routing */
> -#define IP_VS_CONN_F_BYPASS	0x0004		/* cache bypass */
> -#define IP_VS_CONN_F_SYNC	0x0020		/* entry created by sync */
> -#define IP_VS_CONN_F_HASHED	0x0040		/* hashed entry */
> -#define IP_VS_CONN_F_NOOUTPUT	0x0080		/* no output packets */
> -#define IP_VS_CONN_F_INACTIVE	0x0100		/* not established */
> -#define IP_VS_CONN_F_OUT_SEQ	0x0200		/* must do output seq adjust */
> -#define IP_VS_CONN_F_IN_SEQ	0x0400		/* must do input seq adjust */
> -#define IP_VS_CONN_F_SEQ_MASK	0x0600		/* in/out sequence mask */
> -#define IP_VS_CONN_F_NO_CPORT	0x0800		/* no client port set yet */
> -#define IP_VS_CONN_F_TEMPLATE	0x1000		/* template, not connection */
> -#define IP_VS_CONN_F_ONE_PACKET	0x2000		/* forward only one packet */
> +#define IP_VS_CONN_F_FWD_MASK		0x0007		/* mask for the fwd methods */
> +#define IP_VS_CONN_F_MASQ		0x0000		/* masquerading/NAT */
> +#define IP_VS_CONN_F_LOCALNODE		0x0001		/* local node */
> +#define IP_VS_CONN_F_TUNNEL		0x0002		/* tunneling */
> +#define IP_VS_CONN_F_DROUTE		0x0003		/* direct routing */
> +#define IP_VS_CONN_F_BYPASS		0x0004		/* cache bypass */
> +#define IP_VS_CONN_F_SYNC		0x0020		/* entry created by sync */
> +#define IP_VS_CONN_F_HASHED		0x0040		/* hashed entry */
> +#define IP_VS_CONN_F_NOOUTPUT		0x0080		/* no output packets */
> +#define IP_VS_CONN_F_INACTIVE		0x0100		/* not established */
> +#define IP_VS_CONN_F_OUT_SEQ		0x0200		/* must do output seq adjust */
> +#define IP_VS_CONN_F_IN_SEQ		0x0400		/* must do input seq adjust */
> +#define IP_VS_CONN_F_SEQ_MASK		0x0600		/* in/out sequence mask */
> +#define IP_VS_CONN_F_NO_CPORT		0x0800		/* no client port set yet */
> +#define IP_VS_CONN_F_TEMPLATE		0x1000		/* template, not connection */
> +#define IP_VS_CONN_F_ONE_PACKET		0x2000		/* forward only one packet */
> +#define IP_VS_CONN_F_TMPL_PERSISTED	0x4000		/* template, confirmed persistent */
>  
>  /* Initial bits allowed in backup server */
>  #define IP_VS_CONN_F_BACKUP_MASK (IP_VS_CONN_F_FWD_MASK | \
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index 370abbf6f421..6afc606a388c 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -820,6 +820,7 @@ static void ip_vs_conn_rcu_free(struct rcu_head *head)
>  static void ip_vs_conn_expire(struct timer_list *t)
>  {
>  	struct ip_vs_conn *cp = from_timer(cp, t, timer);
> +	struct ip_vs_conn *cp_c;
>  	struct netns_ipvs *ipvs = cp->ipvs;
>  
>  	/*
> @@ -834,8 +835,15 @@ static void ip_vs_conn_expire(struct timer_list *t)
>  		del_timer(&cp->timer);
>  
>  		/* does anybody control me? */
> -		if (cp->control)
> +		cp_c = cp->control;
> +		if (cp_c) {
>  			ip_vs_control_del(cp);
> +			if (cp_c->flags & IP_VS_CONN_F_TEMPLATE &&
> +			    !(cp_c->flags & IP_VS_CONN_F_TMPL_PERSISTED)) {
> +				IP_VS_DBG(4, "del conn template\n");
> +				ip_vs_conn_expire_now(cp_c);

	So, we have current conn expired after 60 seconds
in IP_VS_TCP_S_SYN_RECV state and possibly other conns
in same state that are not expired yet.

	Another option is just to use something like:

	if (cp_c) {
		ip_vs_control_del(cp);
		/* Restart cp_c timer only for last conn */
		if (!atomic_read(&cp_c->n_control) &&
		    (cp_c->flags & IP_VS_CONN_F_TEMPLATE) &&
		    /* Some func to decide when to drop cp_c:
		     * - it can be for SYN state
		     * - it can be when cp was dropped on load
		     */
		    cp->state == IP_VS_TCP_S_SYN_RECV) {
			IP_VS_DBG(4, "del conn template\n");
			ip_vs_conn_expire_now(cp_c);
		}
	}

	It is not perfect, i.e. it does not know if there was
some conn that was established in the past:

- CONN1: SYN, SYN+ACK, ESTABLISH, FIN, FIN+ACK, expire
- CONN2: expire in SYN state, drop tpl before persistent timeout

	But it should work in the general case.
Anyways, give me some days to think more on this issue.

Regards

--
Julian Anastasov <ja@xxxxxx>

[Index of Archives]     [Linux Filesystem Devel]     [Linux NFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [X.Org]

  Powered by Linux