Re: multicast performance

"Marco P." <marco.tijuana@xxxxxxxxx> · Thu, 5 Nov 2009 23:16:34 +0100

- Multicast performance decreases with the number of senders.
I.e. if a single process can send data at 800Mbit/s, then two
senders  (most likely) won't be able to send data at 400Mbit/s
each.  (just saying that since you're using "many" servers)

That's good to know.  Can you be more specific?  Do you know what
kind of loss factor we can expect with each additional process?

Also, right now, we are more concerned with latency than throughput.
Our total traffic is less than what the hardware can handle from a
throughput perspective, but we are more senstive to latency issues.

I think this depends on the switch you're using.

We tried the following, with two simple programs[1]:
-sender: sends multicasts messages at a fixed rate (i.e. 1000msg/ 
second), the size of those is also relevant.
-receiver: measures at which rate messages are received

You then start a single sender and some number of receivers on  
different machines.
Hopefully you should be able to increase the data rate close to the  
nominal limit of the switch with the receivers getting all packets  
(setting a large default kernel buffer size also helps[2]).

If you then start two senders at half the rate of the best before, you  
would expect the receivers to get the same number of messages.
With our switch this was not the case, the receivers were losing lots  
of messages.
I can't tell you a number but it probably won't apply on your switch  
anyway..

Even if you're concerned with latency this may be interesting.
In our case we learned that unregulated send rate may generate  
multicast storms that literally kill the switch for a few seconds.

Marco

[1] You can probably find some tool that does the job, netperf maybe.
[2] look into net.core.rmem_max

- Timestamping after receive() can be very imprecise.  You should
probably use in-kernel timestamping for that purpose.

Also good to know.  I'm not familiar with kernel timestamping, but
will certainly look into it.

On the other hand, timestamping after the recv() call is still a
useful metric, since we can't use the data until we call recv().
However, more precise timestamping would allow us to further
pinpoint the exact location of the delay(s).

Thank you,
Matt

--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html