Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 11-12-2014 20:39, Julian Anastasov wrote:

	Hello,

On Thu, 11 Dec 2014, Marcelo Ricardo Leitner wrote:

On 10-12-2014 19:27, Julian Anastasov wrote:

  int conn_reuse_mode;

  conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
  if (conn_reuse_mode && !iph.fragoffs &&
      is_new_conn(skb, &iph) && cp &&
      ...
      unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

Looks great! Thanks for all the directions.

One question: at this if(), shouldn't we also check for &cp->n_control or you
considered it in the '...'? Because as you said, if this connection controls
another one we won't expire it and AFAICT we may end up with two duplicate
entries for the given src host/port/service.

Currently I'm with:
         if (conn_reuse_mode && !iph.fragoffs &&
             is_new_conn(skb, &iph) && cp &&
             !atomic_read(&cp->n_control) &&
             ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
              unlikely(!atomic_read(&cp->dest->weight))) ||
              unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

	I think, the cp->n_control check is not needed,
we can live with many connections... in master server.

Okay, good.

	But it looks like the new mechanism opens the door
for problems with the sync protocol. The master will
send sync messages for the new connection that we
create but backup server will hit the old connection
that can be to another real server (daddr). Backup will update
the cp->flags and cp->state but cp->dest can not be changed.
Our old (expired) connection will try to send sync messages
but as its state does not change ip_vs_sync_conn_needed()
will prevent the sending for FW/TW. Even if the old connection
does not cause problems, the new connection is not properly synced,
ip_vs_proc_conn() will call ip_vs_conn_in_get() and
we do not notice that daddr/dport differs.

	May be it can be solved by adding daddr+dport
comparison after the ip_vs_conn_in_get() call in
ip_vs_proc_conn(), so that we can support multiple
connections that differ in daddr+dport. Or even
daddr+dport check in ip_vs_conn_in_get just for the backup.
But even in this case there is a risk because the sync messages
can come in random order, for example, a short connection
is expired in TW state, we create new connection but
the backup server gets sync messages in reverse order,
i.e. first for the new connection.

	If above is not implemented, the
'!(ipvs->sync_state & IP_VS_STATE_MASTER) &&' check
should prevent the expiration. We will create new connection
due to weight=0 but backup can remain with wrong daddr
from the old connection, that is the price we pay on
rescheduling to different real server.

	And may be we have to add also a cp->control check
in is_new_conn_expected(), the reason: persistence
should have priority. 'cp->control' check should be in
is_new_conn_expected(). IIRC, people use expire_nodest_conn
even for setups with persistence, so cp->control should
not prevent rescheduling in this case. cp->control also covers
the case with FTP DATA, such conns should use the same
real server.

	What about:

static inline bool can_expire_connection()
{
	/* Controlled (FTP DATA or persistence)? */
	if (cp->control)
		return false;
	---> Below is the previous version, unchanged:
         return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
                ((conn_reuse_mode & 2) &&
                 cp->state == IP_VS_TCP_S_FIN_WAIT &&
                 (cp->flags & IP_VS_CONN_F_NOOUTPUT));
}

	New version:

         conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
         if (conn_reuse_mode && !iph.fragoffs &&
             is_new_conn(skb, &iph) && cp &&
	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
	      unlikely(!atomic_read(&cp->dest->weight))) ||
	     (!(ipvs->sync_state & IP_VS_STATE_MASTER) &&
	      unlikely(can_expire_connection(cp, conn_reuse_mode))))) {

	As result, actions by priority are as follows:

- reschedule for expire_nodest_conn=1 when weight=0, even
if daddr/dport can become different in backup, even on persistence

- sync protocol is active: reuse to prevent conns with different
real server in master and backup

- reuse on persistence

- reschedule on FW/TW: when no sync and no persistence

- reuse, by default

TBH now I have to read more on the sync mechanism, it's been a while. I'll try going with the first option, a way to get this properly synced to backup server instead of disabling it when with sync enabled, because this last would put a severe limitation on this feature. I haven't seen many single node setups...

I'll get back to you by Tue I think.

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Devel]     [Linux NFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [X.Org]

  Powered by Linux