Re: Route cache performance under stress

Jamal Hadi <hadi@shell.cyberus.ca> · Mon, 19 May 2003 18:37:43 -0400 (EDT)

Took Linux kernel off the cc list.

On Mon, 19 May 2003, Ralph Doncaster wrote:

> When I looked at the route-cache code, efficient wasn't the word the came
> to mind.  Whether the problem is in the route-cache or not, getting
> 100kpps out of a linux router with <= 1Ghz of CPU is not at all an easy
> task.  I've tried 2.2 and 2.4 (up to 2.4.20) with 3c905CX cards, with and
> without NAPI, on a 750Mhz AMD.  I've never reached 100kpps without
> userland (zebra) getting starved.  I've even tried the e1000 with 2.4.20,
> and it still doesn't cut it (about 50% better performance than the 3Com).
> This is always with a full routing table (~110K routes).
>

I just tested a small userland apps which does some pseudo routing in
userland. With NAPI i am able to do 148Kpps without it same hardware,
about 32Kpps.
I cant test beyond 148Kpps because thats the max pps a 100Mbps card can
do. The point i am making is i dont see the user space starvation.
Granted this is not the same thing you are testing.

> If I actually had the time to do the code, I'd try dumping the route-cache
> altogether and keep the forwarding table as an r-tree (probably 2 levels
> of 2048 entries since average prefix size is /22).  Frequently-used routes
> would lookup faster due to CPU cache hits.  I'd have all the crap for
> source-based routing ifdef'd out when firewalling is not compiled in.
>

I think theres definete benefit to flow/dst cache as is. Modern routing
really should not be just about destination address lookup. Thats whats
practical today (as opposed to the 80s). I agree that we should be
flexible enough to not enforce that everybody use the complexity of
looking up via 5 tuples and maintaining flows at that level - if the
cache lookup is the bottleneck. Theres a recent patch that made it into
2.5.69 which resolves (or so it seems - havent tried it myself) the
cache bucket distribution. This was a major problem before.
The second level issue is on cache misses how fast can you lookup.
So far we are saying "fast enough". Someone needs to prove it is not.

> My next try will be with FreeBSD, using device polling and the e1000 cards
> (since it seems there are no polling patches for the 3c905CX under
> FreeBSD).  From the description of how polling under FreeBSD works
> http://info.iet.unipi.it/~luigi/polling/
> vs NAPI under linux, polling sounds better due to the ability to configure
> the polling cycle and CPU load triggers.  From the testing and reading
> I've done so far, NAPI doesn't seem to kick in until after 75-80% CPU
> load.  With less than 25kpps coming into the box zebra seems to take
> almost 10x longer to bring up a session with full routes than it does with
> no packet load.  Since CPU load before zebra becomes active is 70-75%, it
> would seem a lot of cycles is being wasted on context switching when zebra
> gets busy.
>

Not interested in BSD. When they can beat Linuxs numbers i'll be
interested.

> If there is a way to get the routing performance I'm looking for in Linux,
> I'd really like to know.  I've been searching an asking for over a year
> now.  When I initially talked to Jamal about it, he told me NAPI was the
> answer.  It does help, but from my experience it's not the answer.  I get
> the impression nobody involved in the code has has tested under real-world
> conditions.  If that is, in fact, the problem then I can provide an ebgp
> multihop full feed and a synflood utility for stress testing.  If the
> linux routing and ethernet driver code is improved so I can handle 50kpps
> of inbound regular traffic, a 50kpps random-source DOS, and still have 50%
> CPU left for Zebra then Cisco might have something to worry about...
>

I think we could do 50Kpps in a DOS environment.
We live in the same city. I may be able to spare half a weekend day and
meet up with you for some testing.

cheers,
jamal
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html