Re: Multicast delays and high iowait

Bill Fink <billfink@xxxxxxxxxxxxxx> · Wed, 2 Apr 2008 02:04:30 -0400

On Tue, 1 Apr 2008, Matt Garman wrote:

> We're using multicast basically for some inter-processs
> communication.
> 
> We timestamp (and log, in a separate thread) all of our sends and
> receives, and do analysis on the logs.
> 
> We're finding occassional (once or twice a day) "blips" where the
> receipt of multicast messages is delayed anywhere from 200
> milliseconds to three or four whole seconds.
> 
> In one case, we have only one server in the network, and are still
> seeing this.  In this scenario, do the multicast messages actually
> use the physical network?
> 
> I'm running sar on these machines (collecting data every five
> seconds); any delay >600 ms seems to conincide with extremely high
> iowait (but the load on any CPU during these times is always below
> 1.0).
> 
> We have the sysctl net.core.rmem_max parameter set to 33554432.
> 
> Our code uses setsockopt() to set the recieving buffer to the
> maximum size allowed by the kernel (i.e. 33554432 in our case).
> 
> The servers are generally lightly loaded: typically they have a load
> of <1.0, and rarely does the load exceed 3.0---yet the servers have
> eight physical cores.
> 
> This is with kernel 2.6.9-42.ELsmp, i.e. the default for CentOS 4.4.
> 
> This doesn't appear to be a CPU problem.  I wrote a simple multicast
> testing program.  It sends a constant stream of messages, and, in a
> separate thread, logs the time of each send.  I wrote a
> corresponding receive program (logs receive times in a separate
> thread).  Running eight instances of cpuburn, I can't generate any
> significant delays.  However, if I run something like
> 
>     dd bs=1024000 if=/dev/zero of=zeros.dat count=12288
> 
> I can create multicast delays over one second.  This will also
> generate high iowait in the sar log.  However, in actual production
> use, no process should ever push the disk as hard as that "dd" test.
> (In other words, while I can duplicate the problem, I'm not sure
> it's a fair test).
> 
> Any ideas or suggestions would be much appreciated.  I don't really
> know enough about the kernel's network architecture to devise any
> more tests or know how else I might be able to pinpoint the cause of
> this problem.

Hi Matt,

One thing you could try is to set the CPU affinity of your client/server
and the NIC interrupts to one CPU and the disk interrupts to a different
CPU.

On my network test systems, I actually set the CPU affinity of all
the normal system processes to CPU 1, by adding the following at the
beginning of the /etc/rc.sysinit script (this is tailored for my
dual CPU servers so the "2" CPU mask reflects my particular CPU
configuration):

	taskset -p 2 1
	taskset -p 2 $$

Then at the end of the /etc/rc.local script, I add:

	taskset -p 1 `ps ax | grep xinetd | grep -v grep | awk '{ print $1 }'`

which causes xinetd and any servers it spawns to run on CPU 0.

I also have in the /etc/rc.local script:

	echo 1 >> /proc/irq/`grep eth2 /proc/interrupts | awk '{ print $1 }' | sed 's/://'`/smp_affinity

This forces the NIC interrupts for the 10-GigE NIC (eth2) to be handled
by CPU 0.

There are no other active network interfaces on these servers, or I
would move their interrupts to CPU 1.  And you might want to do likewise
for the disk interrupts (I may wind up doing this myself).

Finally run the client/server command on the same CPU as the NIC
interrupts, e.g. in the above scenario you could run the client by:

	taskset 1 client [arguments]

Note the taskset command has very non-intuitive command structure
(at least to me), so consult the man page.

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html