Chad Reese wrote:
I assume this is happening under high network load. Some early Octeon ethernet drivers had a problem where they could starve the rest of the system processing incoming packets. The message you are getting is the kernel warning you that userspace hasn't been given any processing time. What is probably happening is the cavium_ethernet driver is spending all its time in a receive tasklet getting packets and then dropping them. This has been fixed in a later cavium SDK, but it looks like you are running an ancient kernel. A newer kernel is available on the cavium support site.
It seems unlikely it's due to cpu load. It's only receiving around 30000 packets/sec total, and before the problem shows up "top" shows some cores at around 25% and others totally idle. The only symptom of the problem is that the system simply stops responding.
You wouldn't happen to know which versions fixed the problem you described above, would you? Upgrading the kernel isn't an option, but it wouldn't be the first time we've had to backport things.
The cavium ethernet driver doesn't use the standard NAPI interface since it doesn't support multicore receive for a single port.
Ah...I was wondering about that. Chris