Jon, this is new bug. Can you please send me config and describe how exactly are you restarting corosync? (script, kill -INT pid, ...) Also if you have coredumps, it would be interesting to see not only srp_addr->addr (because only [0] is checked), but also memb_join->proc_list_entries, instance->my_proc_list_entries (should be same) and full content of proc_list array (must have memb_join->proc_list_entries entries) and instance->my_proc_list (should have instance->my_proc_list_entries entries). First entry in proc_list and instance->my_proc_list looks same. Regards, Honza Burgess, Jon napsal(a): > I have been doing some tests which involve breaking the connection between two nodes by restarting Corosync and occasionally I see the code failing the assert(addrlen) in totemip_equal(). I have hit this a couple of times now but I'm not sure exactly how reproducible it is. This is with Corosync-2.0.1. > > (gdb) bt > #0 0x00007f3681ad9da5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x00007f3681adb2c3 in abort () at abort.c:88 > #2 0x00007f3681ad2d99 in __assert_fail (assertion=0x7f36838c7278 "addrlen", file=0x7f36838c726e "totemip.c", line=106, > function=0x7f36838c7290 "totemip_equal") at assert.c:78 > #3 0x00007f36838b5229 in totemip_equal (addr1=<value optimized out>, addr2=<value optimized out>) at totemip.c:106 > #4 0x00007f36838c2e35 in srp_addr_equal (instance=0x7f3683ea8010, memb_join=0x743e78) at totemsrp.c:1114 > #5 memb_set_equal (instance=0x7f3683ea8010, memb_join=0x743e78) at totemsrp.c:1291 > #6 memb_join_process (instance=0x7f3683ea8010, memb_join=0x743e78) at totemsrp.c:4047 > #7 0x00007f36838c35bc in message_handler_memb_join (instance=0x7f3683ea8010, msg=<value optimized out>, msg_len=<value optimized out>, > endian_conversion_needed=<value optimized out>) at totemsrp.c:4304 > #8 0x00007f36838b9748 in rrp_deliver_fn (context=0x7028c0, msg=0x743e78, msg_len=159) at totemrrp.c:1792 > #9 0x00007f36838b6c61 in net_deliver_fn (fd=<value optimized out>, revents=<value optimized out>, data=<value optimized out>) at totemudp.c:465 > #10 0x00007f3683457c1f in ?? () > > Dumping out the lists being compared shows that they both have two entries. The first entry in both cases is the local node. The second entry is zeroed. > > (gdb) p instance->my_proc_list->addr[0] > $16 = {nodeid = 1, family = 2, addr = "\251\376\000\001", '\000' <repeats 11 times>} > (gdb) p instance->my_proc_list->addr[1] > $17 = {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>} > (gdb) p instance->my_proc_list_entries > $18 = 2 > > (gdb) p ((struct srp_addr *)memb_join->end_of_memb_join)->addr[0] > $23 = {nodeid = 1, family = 2, addr = "\251\376\000\001", '\000' <repeats 11 times>} > (gdb) p ((struct srp_addr *)memb_join->end_of_memb_join)->addr[1] > $24 = {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>} > (gdb) p memb_join->proc_list_entries > $10 = 2 > > The log messages show the local now forming and breaking associations with the peer node but nothing which looks obviously wrong: > > [QUORUM] This node is within the non-primary component and will NOT provide any services. > [QUORUM] Members[1]: 1 > [QUORUM] Members[1]: 1 > [TOTEM ] A processor joined or left the membership and a new membership (169.254.0.1:316) was formed. > [MAIN ] Completed service synchronization, ready to provide service. > [QUORUM] Members[2]: 1 3 > [TOTEM ] A processor joined or left the membership and a new membership (169.254.0.1:324) was formed. > [QUORUM] This node is within the primary component and will provide service. > [QUORUM] Members[2]: 1 3 > [MAIN ] Completed service synchronization, ready to provide service. > [QUORUM] This node is within the non-primary component and will NOT provide any services. > [QUORUM] Members[1]: 1 > [QUORUM] Members[1]: 1 > [TOTEM ] A processor joined or left the membership and a new membership (169.254.0.1:328) was formed. > [MAIN ] Completed service synchronization, ready to provide service. > > Does this seem familiar to anyone? Might it be fixed already in a newer release? > > Jon > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss