18.02.2013 10:50, Andrew Beekhof wrote: > This sounds like it might be relevant: > https://bugzilla.redhat.com/show_bug.cgi?id=880035 Yes, at least partially. Although I cannot comment much on original issue you see there, except that libvirt is not a suspect there - it has nothing to multicast, it is solely to kernel and qemu. I've originally seen (with inter-node qemu mcast tunnels) the similar behaviour to what David wrote in Comment 1. Processes running on different hosts and bound to one IGMP group loose each other. >From what I understand, to prevent false IGMP membership expiration, you should have some device (one is at least enough) in a broadcast layer-2 segment, which originates IGMP query requests to all its ports known to have been joined to a IGMP groups in a past. Other igmp-snooping devices should propagate that requests downstream, so the whole membership tree remains consistent (that is relevant to a case when you run corosync UDPM in VMs connected to linux-bridge ports where host side of that bridge have ports connected to a switch). It you have a bunch of VMs all running on one host and connected via internal bridge, you would enable IGMP querier on that bridge. If you have inter-node groups, you'd better enable it on a switch. >From cisco docs (I do not know if it is relevant to linux-bridge), you need to have an IP address configured on a device to have querier working, so you can't enable it on a pure bridge (where you do not have an IP address on a host-side interface). If you have multicast router in a segment for all groups, you probably do not need to care, because router should send queries. Although you may need to mark its port as 'mrouter'. In my case, I have several hosts in one (hardware) broadcast segment and have IGMP snooping enabled on a switch. IGMP joins are originated by qemu processes running on hosts themselves. Although I use bridges on hosts, I do not have host-side IPs on all of them, but I have switch as a "central" device, and, as it is enough to have only one querier in a segment, I delegated that function to it. And that helped. As I know that models used by corosync and qemu are very similar, I expect that to help with corosync UDPM as well. > > On Mon, Feb 18, 2013 at 5:46 PM, Vladislav Bogdanov > <bubble@xxxxxxxxxxxxx> wrote: >> Hi all, >> >> can anyone please confirm that enabling IGMP querier on a switch (stack) >> instead of disabling IGMP snooping (thus making switch broadcast all >> multicast packets) helps to solve node loss issue? >> >> I enabled that feature in order to solve packet loss over qemu mcast >> tunnels, and that helped dramatically. That tunnels operate very similar >> to corosync, where all relevant nodes first join IGMP group, and then >> all of them send multicast packets to that group. So in both cases there >> is no designated 'sender' or designated 'router port' where all >> multicast traffic in a layer-2 broadcast segment originate from. >> >> So I think it may help to stabilize corosync multicast mode as well. >> >> May be somebody have hardware-based testing setup with IGMP-snooping >> enabled switch(es) and IGMP querier (in cisco terms, different vendors >> may call it differently) feature available and can test if this actually >> helps? >> >> Vladislav >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss