Re: Multicast mode and cisco 3750x switches

Vladislav Bogdanov <bubble@xxxxxxxxxxxxx> · Mon, 18 Feb 2013 11:43:41 +0300

18.02.2013 10:50, Andrew Beekhof wrote:
> This sounds like it might be relevant:
>    https://bugzilla.redhat.com/show_bug.cgi?id=880035

Yes, at least partially. Although I cannot comment much on original
issue you see there, except that libvirt is not a suspect there - it has
nothing to multicast, it is solely to kernel and qemu.

I've originally seen (with inter-node qemu mcast tunnels) the similar
behaviour to what David wrote in Comment 1. Processes running on
different hosts and bound to one IGMP group loose each other.

>From what I understand, to prevent false IGMP membership expiration, you
should have some device (one is at least enough) in a broadcast layer-2
segment, which originates IGMP query requests to all its ports known to
have been joined to a IGMP groups in a past. Other igmp-snooping devices
should propagate that requests downstream, so the whole membership tree
remains consistent (that is relevant to a case when you run corosync
UDPM in VMs connected to linux-bridge ports where host side of that
bridge have ports connected to a switch).

It you have a bunch of VMs all running on one host and connected via
internal bridge, you would enable IGMP querier on that bridge.
If you have inter-node groups, you'd better enable it on a switch.

>From cisco docs (I do not know if it is relevant to linux-bridge), you
need to have an IP address configured on a device to have querier
working, so you can't enable it on a pure bridge (where you do not have
an IP address on a host-side interface).

If you have multicast router in a segment for all groups, you probably
do not need to care, because router should send queries. Although you
may need to mark its port as 'mrouter'.

In my case, I have several hosts in one (hardware) broadcast segment and
have IGMP snooping enabled on a switch. IGMP joins are originated by
qemu processes running on hosts themselves. Although I use bridges on
hosts, I do not have host-side IPs on all of them, but I have switch as
a "central" device, and, as it is enough to have only one querier in a
segment, I delegated that function to it. And that helped.

As I know that models used by corosync and qemu are very similar, I
expect that to help with corosync UDPM as well.

> 
> On Mon, Feb 18, 2013 at 5:46 PM, Vladislav Bogdanov
> <bubble@xxxxxxxxxxxxx> wrote:
>> Hi all,
>>
>> can anyone please confirm that enabling IGMP querier on a switch (stack)
>> instead of disabling IGMP snooping (thus making switch broadcast all
>> multicast packets) helps to solve node loss issue?
>>
>> I enabled that feature in order to solve packet loss over qemu mcast
>> tunnels, and that helped dramatically. That tunnels operate very similar
>> to corosync, where all relevant nodes first join IGMP group, and then
>> all of them send multicast packets to that group. So in both cases there
>> is no designated 'sender' or designated 'router port' where all
>> multicast traffic in a layer-2 broadcast segment originate from.
>>
>> So I think it may help to stabilize corosync multicast mode as well.
>>
>> May be somebody have hardware-based testing setup with IGMP-snooping
>> enabled switch(es) and IGMP querier (in cisco terms, different vendors
>> may call it differently) feature available and can test if this actually
>> helps?
>>
>> Vladislav
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss