Multicast delays and high iowait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We're using multicast basically for some inter-processs
communication.

We timestamp (and log, in a separate thread) all of our sends and
receives, and do analysis on the logs.

We're finding occassional (once or twice a day) "blips" where the
receipt of multicast messages is delayed anywhere from 200
milliseconds to three or four whole seconds.

In one case, we have only one server in the network, and are still
seeing this.  In this scenario, do the multicast messages actually
use the physical network?

I'm running sar on these machines (collecting data every five
seconds); any delay >600 ms seems to conincide with extremely high
iowait (but the load on any CPU during these times is always below
1.0).

We have the sysctl net.core.rmem_max parameter set to 33554432.

Our code uses setsockopt() to set the recieving buffer to the
maximum size allowed by the kernel (i.e. 33554432 in our case).

The servers are generally lightly loaded: typically they have a load
of <1.0, and rarely does the load exceed 3.0---yet the servers have
eight physical cores.

This is with kernel 2.6.9-42.ELsmp, i.e. the default for CentOS 4.4.

This doesn't appear to be a CPU problem.  I wrote a simple multicast
testing program.  It sends a constant stream of messages, and, in a
separate thread, logs the time of each send.  I wrote a
corresponding receive program (logs receive times in a separate
thread).  Running eight instances of cpuburn, I can't generate any
significant delays.  However, if I run something like

    dd bs=1024000 if=/dev/zero of=zeros.dat count=12288

I can create multicast delays over one second.  This will also
generate high iowait in the sar log.  However, in actual production
use, no process should ever push the disk as hard as that "dd" test.
(In other words, while I can duplicate the problem, I'm not sure
it's a fair test).

Any ideas or suggestions would be much appreciated.  I don't really
know enough about the kernel's network architecture to devise any
more tests or know how else I might be able to pinpoint the cause of
this problem.

Thank you,
Matt

--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux