Re: Route cache performance under stress

Florian Weimer <fw@deneb.enyo.de> · Mon, 26 May 2003 11:34:37 +0200

"David S. Miller" <davem@redhat.com> writes:

> Let their loss be our gain :-)  No, I am serious, their solution to
> misbehaving flows seems to be just using slow path always and
> continually optimize the slow path.

Exactly, as a result you get stateless IP forwarding whose performance
is mostly independent of the traffic characteristics.

> Now, how about some real explanation about what they are actually
> doing?  Are they replicating the routing table all over the place?

They do this for dCEF.  In this case the CEF data structure are
replicated on every linecard that supports autonomous routing
decisions.  (This is essential for GSRs because the internal bus is
too narrow for almost any communications.  You are lucky if the
routing tables updates do not saturate it.)

> That's one possibility, and would match up to their saying that more
> router memory is required when using CEF.

CEF is essentially yet another copy of the routing table and therefore
requires memory (they do not aggregate prefixes, so some memory is
required for a full Internet routing table.)

> The other possibility is that it's a faster-trie thing generated
> from the normal routing tables.

Yeah, it's some kind of a trie according to a few Cisco documents
(which are a bit self-contradictory, though).

> Since CEF aparently works with QoS and other features, the key must
> be many bits wide.  Probably similar in size to our flowi's.

I don't think IOS QoS is based on (d)CEF.  It's true that on some
Cisco routers, QoS requires (d)CEF-enabled linecards, but I believe
this is just a software design issue and not inherently tied to the
CEF data structures.

So far I've only seen CEF tables with IPv4 addresses as indices. 8-)

> So some bit branching trie based upon flow parameters.  There are
> hundreds of patented such schemes :-)

Just ignore them. 8-)

> Anyways, you keep saying that flow hashing is stupid, can you propose
> an alternative?

Only for pure IPv4 CIDR routing (based on prefixes and destination
addresses).  I'd try the following scheme: split the destination
address into two parts, and use the more significant one as an index
into a table of (function pointer, closure data pointer) pairs.  These
functions return a pointer to the adjacency information.  They can be
implemented in various ways, depending on the structure of the less
significant part (e.g. if only one subnet is routed differently from
the others, a few comparisons are sufficent to identify it).  As a
result, the routing decision could made with one or two indirect calls
and a couple of memory accesses.

For hosts, if the routing table contains less than (say) ten routes,
order it by decreasing prefix length and scan it sequentially for a
match.

In all cases, L2 addresses should be stored indexed by the least
significant bits of the corresponding IP addresses (no hashing
required).

Of course, this will result in vastly decreased functionality (no
arbitary netmasks, no policy-based routing, code will be fine-tuned
for typical Internet routing tables), so this proposal definitely
comes at a price.

(In the meantime, it might be beneficial to use more buckets in the
routing cache and rely less on collision chaining.)
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html