On Sat, Jun 12, 2004 at 08:45:43AM +0200, Henrik Nordstrom wrote: > On Sat, 12 Jun 2004, Peter Boyle wrote: > > > We're round robin sending to > 1024 IP's/MAC's on a subnet, > > kernel version (2.4.20-31.9). > > Then you need to increase the neighbor cache size for this interface. See > /proc/sys/net/ipv4/neigh/ and the neigh_alloc() function. One of my clients is now seeing the same problem on a heavily loaded system. However, increasing the gc_thresh2/3 even to rediculously large sizes (or decreasing them to have gc happen more often) doesn't help at all. A couple of times per day (with _no_ higher PPS rate and no more than about 800 neighbour cache entries (ip neigh show | wc -l), still there is (over several minutes) printk's with "Neighbour table overflow." However, the table is certainly not full. Closer investigation showed that the following happens: 1) arp_bind_neighbour fails for some reason 2) the 'emergency' rt_garbage_collect with zero min_interval and elasticity 1 takes place. I guess the assumption is that cleaning the routing cache will drop references to the neighbour table and thus clean it somehow. 3) still the next arp_bind_neighbour fails. So the question is, why does it fail? It calls __neigh_lookup_errno() and further on either neigh_loopup() or neigh_create() fail. Especially in neigh_create there are several error cases (n->parms->neigh_setup() or tbl->constructor() failing) that might make it return an eror. So the blind assumption (and error message) that the neighbour table might be full seems a bit broad. I did not yet have a chance to put a kernel with more printk's for debugging yet. Expect more news later this week. > gc_thresh1 does not seem to be used.. yes, I noted that, too ;) -- - Harald Welte <laforge@gnumonks.org> http://www.gnumonks.org/ ============================================================================ Programming is like sex: One mistake and you have to support it your lifetime
Attachment:
signature.asc
Description: Digital signature