On Thu, 23 Mar 2006, Jesper Dangaard Brouer wrote:
On Tue, 21 Mar 2006, David S. Miller wrote:
From: Jesper Dangaard Brouer <hawk@xxxxxxx>
Date: Tue, 21 Mar 2006 15:51:34 +0100 (CET)
You guessed right... I did enable IP_ROUTE_MULTIPATH_CACHED, I have
now disabled it and equal multi path routing in general
(CONFIG_IP_ROUTE_MULTIPATH).
It is almost certainly the cause of your crashes, that code
is still extremely raw and that's why it is listed as "EXPERIMENTAL".
It seems your are right :-) (and I'll take more care of using experimental
code on production again). The machine, has now been running for 34 hours
without crashing. The strange thing is that I'm running the same kernel on
30 other (similar) machines, which have not crashed. (I do suspect the
specific traffic load pattern to influence this)
Argh!! -- nemesis!!! The machine, just died again...
The machine did not crash it just ran out of memory, and killed too many
important processes. I had to power recycle it... :-((( Could ping it...
I can see that, the traffic pattern have changed and the route cache is
growing rapitly...
BUT, I do think I have noticed another problem in the garbage collection code
(route.c), that causes the garbage collector (almost) never to garbage
collect.
This is caused by the value "ip_rt_max_size"
(/proc/sys/net/ipv4/route/max_size)
being set too large. It is set to 16 times the gc_thresh value (this size
dependend on the memory size). In the garbage collection function
(rt_garbage_collect) garbage collecting entries are ignored (gc_ignored) if
the number of entries are below "ip_rt_max_size".
With 1Gb memory, gc_thresh=65536 times 16 is 1048576. Which means that we
only start to garbage collect when there is more than 1 million entries. This
seems wrong... (the reason it does not grow this large is the 600 second
periodic flushes).
Hilsen
Jesper Brouer
--
-------------------------------------------------------------------
Cand. scient datalog
Dept. of Computer Science, University of Copenhagen
-------------------------------------------------------------------
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:5000
/proc/sys/net/ipv4/route/error_cost:1000
grep: /proc/sys/net/ipv4/route/flush: Operation not permitted
/proc/sys/net/ipv4/route/gc_elasticity:8
/proc/sys/net/ipv4/route/gc_interval:60
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:500
/proc/sys/net/ipv4/route/gc_thresh:65536
/proc/sys/net/ipv4/route/gc_timeout:300
/proc/sys/net/ipv4/route/max_delay:10
/proc/sys/net/ipv4/route/max_size:1048576
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_delay:2
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:20
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:20480
/proc/sys/net/ipv4/route/secret_interval:600
Hilsen
Jesper Brouer
--
-------------------------------------------------------------------
Cand. scient datalog
Dept. of Computer Science, University of Copenhagen
-------------------------------------------------------------------
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html