Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 12 Aug 2008, Thomas Jarosch wrote:

> On Monday, 11. August 2008 23:44:21 David Miller wrote:
> > Trying to come up with a signature for this bogus stuff is both time
> > consuming and having a risk of false positives.  And I really question
> > whether this thing is worth it.
> >
> > The sane thing to do in this case is to declare the box inoperative
> > and that it needs to be fixed to avoid this behavior.
> >
> > Any reasonable congestion control scheme is going to run into problems
> > trying to react to the packet patterns this thing creates.  It is
> > therefore not really limited to FRTO so it really shouldn't be treated
> > like an FRTO problem even though it shows up more pronounced when
> > FRTO is enabled.
> 
> David, I agree with you, though I'm not sure about the end user experience:
> 
> The kernel is an early adopter of FRTO and will be bitten by bugs of other
> TCP implementations like we've experienced. I guess most affected users
> just see stalled or slow connections and won't have the time or knowledge
> to debug this.

This is hardly a big problem. Much bigger problem seems to be that some 
distros base to 2.6.24 and did not take TCP fixes that were put to 
2.6.25.7 but not to 2.6.24.y series because it wasn't updated anymore. 
There are hardly any other reports but for 2.6.24 (and the ones which we 
have have gone through @ netdev to fix the bugs / problems) in the ones 
I've seen.

> A proper warning could help them and the kernel
> developers to get this issue solved as quickly as possible.
> 
> We called the hotline of the ISP several times and they always claimed
> sending big mails with Outlook/Windows works, so it must be linux's fault.
> That view of things is totally biased, but it's something I want to make sure
> people can't get away with easily :-)

I should probably one day check how vista's frto is behaving
itself to know better... ...but I guess they'll be running to
some problems with big mails pretty soon... ;-)

In the meantime, can you check the attached patches. Besides the kernel 
patch, you need to build your own patched iproute2 as well to configure 
the features (ip tool among them is enough in case the build of some other 
part of the toolset fails like it did for me). I somewhat tested them, and 
the result seemed to be what I'd expect (I just forced RTOs with some 
netem heavy dropping and quickly glanced over the resulting packet 
patterns near RTO).

-- 
 i.
From b4d1efcf1d4384296d6d6b4f8378f8c408cefc98 Mon Sep 17 00:00:00 2001
From: =?ISO-8859-1?q?Ilpo=20J=E4rvinen?= <ilpo.jarvinen@xxxxxxxxxxx>
Date: Tue, 19 Aug 2008 08:20:16 +0300
Subject: [PATCH] tcp/frto: make frto per route configurable
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

Needs iproute2 support since it isn't able to set RTAX_FEATURES
currently (ie., also the other TCP variant related RTAX_FEATUREs
won't work, they've been unused since the addition in 2003 or
so).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxx>
---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index f4d386c..e628062 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -373,6 +373,7 @@ enum
 #define RTAX_FEATURE_SACK	0x00000002
 #define RTAX_FEATURE_TIMESTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
+#define RTAX_FEATURE_FRTO	0x00000010
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1f5e604..4f1cc0e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1709,11 +1709,15 @@ int tcp_use_frto(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 	struct sk_buff *skb;
 
 	if (!sysctl_tcp_frto)
 		return 0;
 
+	if (dst && (dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_FRTO))
+		return 0;
+
 	/* MTU probe and F-RTO won't really play nicely along currently */
 	if (icsk->icsk_mtup.probe_size)
 		return 0;
-- 
1.5.2.2

From 59d7878c04eb9571c58baf78bfd07b169d3e5c0d Mon Sep 17 00:00:00 2001
From: =?ISO-8859-1?q?Ilpo=20J=E4rvinen?= <ilpo.jarvinen@xxxxxxxxxxx>
Date: Fri, 22 Aug 2008 14:49:00 +0300
Subject: [PATCH] iproute2: enable setting of per route features
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

The kernel has had an entry for per route RTAX_FEATURES which
was added as unused back in 2003. Allow setting them now.

It seems that it's much more sensible to have the meaning
negated because otherwise the meaning of zero is very ambiguous,
ie., does it mean that feature is turned off or not given.
Besides, this matches what one would expect in the intented
use-case, where we have global settings from sysctl and want
to work-around something per route (ie., disable an otherwise
enabled feature).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxx>
---
 include/linux/rtnetlink.h |    1 +
 ip/iproute.c              |   58 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c1f2d50..354a6f1 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -373,6 +373,7 @@ enum
 #define RTAX_FEATURE_SACK	0x00000002
 #define RTAX_FEATURE_TIMESTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
+#define RTAX_FEATURE_FRTO	0x00000010
 
 struct rta_session
 {
diff --git a/ip/iproute.c b/ip/iproute.c
index 2a8f3f8..d4a90fc 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -52,6 +52,20 @@ static const char *mx_names[RTAX_MAX+1] = {
 	[RTAX_FEATURES] = "features",
 	[RTAX_RTO_MIN]	= "rto_min",
 };
+
+struct valname {
+	unsigned int	val;
+	const char	*name;
+};
+
+static const struct valname features[] = {
+	{ RTAX_FEATURE_ECN, "ecn" },
+	{ RTAX_FEATURE_SACK, "sack" },
+	{ RTAX_FEATURE_TIMESTAMP, "timestamps" },
+	{ RTAX_FEATURE_TIMESTAMP, "ts" },
+	{ RTAX_FEATURE_FRTO, "frto"},
+};
+
 static void usage(void) __attribute__((noreturn));
 
 static void usage(void)
@@ -73,7 +87,7 @@ static void usage(void)
 	fprintf(stderr, "           [ rtt TIME ] [ rttvar TIME ]\n");
 	fprintf(stderr, "           [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n");
 	fprintf(stderr, "           [ ssthresh NUMBER ] [ realms REALM ] [ src ADDRESS ]\n");
-	fprintf(stderr, "           [ rto_min TIME ]\n");
+	fprintf(stderr, "           [ rto_min TIME ] [ features DISABLED_FEATURES ]\n");
 	fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n");
 	fprintf(stderr, "          unreachable | prohibit | blackhole | nat ]\n");
 	fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
@@ -83,6 +97,8 @@ static void usage(void)
 	fprintf(stderr, "NHFLAGS := [ onlink | pervasive ]\n");
 	fprintf(stderr, "RTPROTO := [ kernel | boot | static | NUMBER ]\n");
 	fprintf(stderr, "TIME := NUMBER[s|ms|us|ns|j]\n");
+	fprintf(stderr, "DISABLED_FEATURES := sack | timestamps | ts | ecn | frto |\n");
+	fprintf(stderr, "                     [ DISABLED_FEATURES ]\n");
 	exit(-1);
 }
 
@@ -505,10 +521,8 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 			if (mxlock & (1<<i))
 				fprintf(fp, " lock");
 
-			if (i != RTAX_RTT && i != RTAX_RTTVAR &&
-			    i != RTAX_RTO_MIN)
-				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
-			else {
+			if (i == RTAX_RTT || i == RTAX_RTTVAR ||
+			    i == RTAX_RTO_MIN) {
 				unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]);
 
 				val *= 1000;
@@ -520,6 +534,16 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 					fprintf(fp, " %llums", val/hz);
 				else
 					fprintf(fp, " %.2fms", (float)val/hz);
+			} else if (i == RTAX_FEATURES) {
+				int j;
+				unsigned int f = *(unsigned*)RTA_DATA(mxrta[i]);
+				for (j = 0; j < ARRAY_SIZE(features); j++)
+					if (f & features[j].val) {
+						fprintf(fp, " %s", features[j].name);
+						f &= ~features[j].val;
+					}
+			} else {
+				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
 			}
 		}
 	}
@@ -851,6 +875,30 @@ int iproute_modify(int cmd, unsigned flags, int argc, char **argv)
 			if (get_unsigned(&win, *argv, 0))
 				invarg("\"ssthresh\" value is invalid\n", *argv);
 			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_SSTHRESH, win);
+		} else if (matches(*argv, "features") == 0) {
+			int j;
+			unsigned int f = 0;
+			NEXT_ARG();
+			while (1) {
+				for (j = 0; j < ARRAY_SIZE(features); j++) {
+					if (strcmp(*argv, features[j].name) == 0) {
+						f |= features[j].val;
+						if (!NEXT_ARG_OK())
+							goto feat_out;
+						NEXT_ARG();
+						break;
+					}
+				}
+				if (j == ARRAY_SIZE(features)) {
+					if (f)
+						PREV_ARG();
+					break;
+				}
+			}
+feat_out:
+			if (!f)
+				invarg("\"features\" list is invalid\n", *argv);
+			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_FEATURES, f);
 		} else if (matches(*argv, "realms") == 0) {
 			__u32 realm;
 			NEXT_ARG();
-- 
1.5.2.2


[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux