On Tue, 1 Apr 2008, Matt Garman wrote: > We're using multicast basically for some inter-processs > communication. > > We timestamp (and log, in a separate thread) all of our sends and > receives, and do analysis on the logs. > > We're finding occassional (once or twice a day) "blips" where the > receipt of multicast messages is delayed anywhere from 200 > milliseconds to three or four whole seconds. > > In one case, we have only one server in the network, and are still > seeing this. In this scenario, do the multicast messages actually > use the physical network? > > I'm running sar on these machines (collecting data every five > seconds); any delay >600 ms seems to conincide with extremely high > iowait (but the load on any CPU during these times is always below > 1.0). > > We have the sysctl net.core.rmem_max parameter set to 33554432. > > Our code uses setsockopt() to set the recieving buffer to the > maximum size allowed by the kernel (i.e. 33554432 in our case). > > The servers are generally lightly loaded: typically they have a load > of <1.0, and rarely does the load exceed 3.0---yet the servers have > eight physical cores. > > This is with kernel 2.6.9-42.ELsmp, i.e. the default for CentOS 4.4. > > This doesn't appear to be a CPU problem. I wrote a simple multicast > testing program. It sends a constant stream of messages, and, in a > separate thread, logs the time of each send. I wrote a > corresponding receive program (logs receive times in a separate > thread). Running eight instances of cpuburn, I can't generate any > significant delays. However, if I run something like > > dd bs=1024000 if=/dev/zero of=zeros.dat count=12288 > > I can create multicast delays over one second. This will also > generate high iowait in the sar log. However, in actual production > use, no process should ever push the disk as hard as that "dd" test. > (In other words, while I can duplicate the problem, I'm not sure > it's a fair test). > > Any ideas or suggestions would be much appreciated. I don't really > know enough about the kernel's network architecture to devise any > more tests or know how else I might be able to pinpoint the cause of > this problem. Hi Matt, One thing you could try is to set the CPU affinity of your client/server and the NIC interrupts to one CPU and the disk interrupts to a different CPU. On my network test systems, I actually set the CPU affinity of all the normal system processes to CPU 1, by adding the following at the beginning of the /etc/rc.sysinit script (this is tailored for my dual CPU servers so the "2" CPU mask reflects my particular CPU configuration): taskset -p 2 1 taskset -p 2 $$ Then at the end of the /etc/rc.local script, I add: taskset -p 1 `ps ax | grep xinetd | grep -v grep | awk '{ print $1 }'` which causes xinetd and any servers it spawns to run on CPU 0. I also have in the /etc/rc.local script: echo 1 >> /proc/irq/`grep eth2 /proc/interrupts | awk '{ print $1 }' | sed 's/://'`/smp_affinity This forces the NIC interrupts for the 10-GigE NIC (eth2) to be handled by CPU 0. There are no other active network interfaces on these servers, or I would move their interrupts to CPU 1. And you might want to do likewise for the disk interrupts (I may wind up doing this myself). Finally run the client/server command on the same CPU as the NIC interrupts, e.g. in the above scenario you could run the client by: taskset 1 client [arguments] Note the taskset command has very non-intuitive command structure (at least to me), so consult the man page. -Bill -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html