Antw: Re: TOTEM implementation is unreliable ("ring faulty" and retransmit list)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>> Steven Dake <steven.dake@xxxxxxxxx> schrieb am 24.06.2015 um 04:29 in Nachricht
<CAPwfPsicv7JL+sSGCaLcEEzREgEjJPtQ_GEGxtPtX9fjd7FjGQ@xxxxxxxxxxxxxx>:
> It should probably be documented somewhere but may not be, with redundant
> ring, both rings must be of approximately the same performance.

But WHY? The protocol uses either the first or the second ring to implement a reliable transport on each ring, right? It seem to be the implementation is just broken and needs to be fixed.

In a very first step it would be quite helpful to have a syslog message that really states the reason why corosync thinks a ring is faulty when there is absolutely no network problem.

Regards,
Ulrich

> 
> Regards
> -steve
> 
> On Fri, Jun 12, 2015 at 12:12 AM, Ulrich Windl <
> Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> Hi!
>>
>> We are using a RRP configuration (rrp_mode: passive) where both rings run
>> at different speeds: Ring 0 runs at 1Gb/s, and ring 1 runs at 100Mb/s (both
>> full-duplex switched using multicast).
>>
>> We see messages like these while the network has no problems:
>> ---
>> Jun 10 00:51:26 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 00:51:27 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 04:42:10 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 04:42:11 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 04:42:11 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 06:36:07 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 06:36:08 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 06:36:08 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 08:04:08 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 08:04:09 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 08:46:31 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 08:46:32 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 08:46:32 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 09:39:52 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 09:39:53 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 09:39:53 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 12:47:46 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 12:47:47 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 15:06:22 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 15:06:23 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 15:15:17 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 15:15:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 15:15:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 15:52:58 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 15:52:59 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 15:52:59 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 16:46:06 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 16:46:07 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 17:15:12 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 17:15:13 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 17:50:01 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 17:50:02 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 18:27:36 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 18:27:37 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 18:27:37 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 19:04:38 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 19:04:39 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 19:59:44 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 19:59:45 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 20:52:21 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 20:52:22 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 20:52:22 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:08:07 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 21:08:08 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:08:08 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:17:18 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 21:17:19 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:17:19 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:44:08 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 21:44:09 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 21:44:09 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 22:37:12 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 10 22:37:13 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 10 22:37:13 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 01:23:46 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 01:23:47 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 03:26:26 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 03:26:27 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 04:08:02 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 04:08:03 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 05:00:34 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 05:00:35 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 07:54:49 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 07:54:50 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 07:54:50 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 09:13:48 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 09:13:49 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 09:13:49 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 12:39:17 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 12:39:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 12:39:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 12:54:01 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 12:54:02 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 12:54:02 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 13:31:23 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 13:31:24 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 13:31:24 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 15:14:25 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 15:14:26 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 15:14:26 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 15:59:29 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 15:59:30 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 16:32:10 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 16:32:11 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 16:32:11 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 18:23:56 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 18:23:57 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 18:23:57 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 19:16:15 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 19:16:16 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 19:16:16 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 19:53:37 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 19:53:38 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 19:53:38 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 20:44:33 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 20:44:34 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 20:44:34 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 21:57:02 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 11 21:57:03 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 11 21:57:03 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 00:11:00 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 00:11:01 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 00:11:01 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 00:47:30 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 00:47:31 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 01:15:05 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 01:15:06 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 01:15:06 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 02:03:24 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 02:03:25 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 02:03:25 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 03:21:17 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 03:21:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 03:21:18 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 03:42:39 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 03:42:40 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 03:42:40 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 05:00:02 corosync[10179]:  [TOTEM ] Retransmit List: 2f41c87
>> Jun 12 05:31:11 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 05:31:12 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 05:31:12 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 06:35:36 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 06:35:37 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 06:35:37 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 07:14:43 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 07:14:44 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 08:03:14 corosync[10179]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> Jun 12 08:03:15 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 08:03:15 corosync[10179]:  [TOTEM ] Automatically recovered ring 0
>> Jun 12 08:47:08 corosync[10179]:  [TOTEM ] Marking ringid 1 interface
>> 10.2.2.1 FAULTY
>> Jun 12 08:47:09 corosync[10179]:  [TOTEM ] Automatically recovered ring 1
>> Jun 12 08:47:09 corosync[10179]:  [TOTEM ] Automatically recovered ring 1
>> ---
>>
>> Support tells us all of the time that there must be a problem in our
>> network, but that is not true. Actually I found this message in the
>> corosync ChangeLog:
>> --
>>         Because of this commit, while totemrrp_recv_flush() is called,
>> Corosync
>>         drops memb_join packets, but also ORF tokens. In the end, it seems
>> that
>>         sometimes, we drop so many of them that Corosync marks the ring as
>>         faulty.
>> --
>>
>> It seems there are significant issues in the TOTEM implementation. Isn't
>> it possible to make the transport layer (which TOTEM is) reliable first?
>> The syslog messages are also not very helpful, because you aren't told WHY
>> the ring is considered faulty. Same thing is true with the "Retransmit
>> list" at that time there probably was just some load on the network or
>> server. We also had cases where the retransmit list grew and could not be
>> processed while the network (both rings) was fine.
>>
>> An example for that:
>> May 19 15:35:18 corosync[7036]:  [TOTEM ] Automatically recovered ring 0
>> May 19 15:35:18 corosync[7036]:  [TOTEM ] Automatically recovered ring 0
>> May 19 15:43:28 corosync[7036]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was
>> formed.
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> [Throttling these messages would be another nice idea...cutting identical
>> messages from the same second...]
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:53 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:53 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:54 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:54 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:54 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:55 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:55 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:55 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:56 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:56 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:57 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:57 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:57 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:58 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:58 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:59 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:59 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:49:59 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:00 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:00 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:01 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:01 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:01 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:02 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:02 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:03 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:03 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:03 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:04 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:04 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:04 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:05 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:05 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:06 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:06 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:06 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:06 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> [...and so on...]
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> [...]
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> May 19 15:50:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> [...]
>> May 19 15:50:37 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> May 19 15:50:37 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> May 19 15:50:37 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09
>> May 19 15:50:37 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09
>> [...]
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> May 19 15:50:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> [...]
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d
>> May 19 15:51:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d
>> [...]
>> May 19 15:51:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d
>> May 19 15:51:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d
>> May 19 15:51:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f
>> May 19 15:51:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f
>> May 19 15:51:22 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f
>> [...big cut, it's so boring...]
>> May 19 15:53:51 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:51 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:51 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c22 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c22 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c22 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c22 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c22 6c03 6c05
>> 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> May 19 15:53:52 corosync[7036]:  [TOTEM ] Retransmit List: 6c03 6c05 6c07
>> 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17 6c19 6c1b 6c1d 6c1f 6c21 6c23
>> [...]
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c25 6c18
>> 6c1a 6c24 6c26 6c29 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17
>> 6c19 6c1b 6c1d 6c1f 6c21 6c27 6c28
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c1f 6c21 6c27
>> 6c28 6c1c 6c1e 6c20 6c22 6c2a 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13
>> 6c15 6c17 6c19 6c1b 6c1d 6c23 6c25
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c25 6c18
>> 6c1a 6c24 6c26 6c29 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17
>> 6c19 6c1b 6c1d 6c1f 6c21 6c27 6c28
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c1f 6c21 6c27
>> 6c28 6c1c 6c1e 6c20 6c22 6c2a 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13
>> 6c15 6c17 6c19 6c1b 6c1d 6c23 6c25
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c25 6c18
>> 6c1a 6c24 6c26 6c29 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17
>> 6c19 6c1b 6c1d 6c1f 6c21 6c27 6c28
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c1f 6c21 6c27
>> 6c28 6c1c 6c1e 6c20 6c22 6c2a 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13
>> 6c15 6c17 6c19 6c1b 6c1d 6c23 6c25
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] Retransmit List: 6c23 6c25 6c18
>> 6c1a 6c24 6c26 6c29 6c03 6c05 6c07 6c09 6c0b 6c0d 6c0f 6c11 6c13 6c15 6c17
>> 6c19 6c1b 6c1d 6c1f 6c21 6c27 6c28
>> May 19 15:55:07 corosync[7036]:  [TOTEM ] FAILED TO RECEIVE
>> May 19 15:55:13 corosync[7036]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> May 19 15:55:31 corosync[7036]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> [here the node was fenced]
>> May 19 15:58:18 corosync[7036]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] Initializing transport (UDP/IP
>> Multicast).
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] Initializing transport (UDP/IP
>> Multicast).
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] The network interface
>> [172.20.16.1] is now up.
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] The network interface [10.2.2.1]
>> is now up.
>> May 19 16:17:58 corosync[6991]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> May 19 16:18:03 corosync[6991]:  [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> May 19 16:18:17 corosync[6991]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> May 19 16:18:18 corosync[6991]:  [TOTEM ] Automatically recovered ring 0
>> May 19 16:18:18 corosync[6991]:  [TOTEM ] Automatically recovered ring 0
>> May 19 16:21:00 corosync[6991]:  [TOTEM ] Marking ringid 1 interface
>> 10.2.2.1 FAULTY
>> May 19 16:21:01 corosync[6991]:  [TOTEM ] Automatically recovered ring 1
>> May 19 16:21:01 corosync[6991]:  [TOTEM ] Automatically recovered ring 1
>> May 19 16:23:24 corosync[6991]:  [TOTEM ] Marking ringid 1 interface
>> 10.2.2.1 FAULTY
>> May 19 16:23:25 corosync[6991]:  [TOTEM ] Automatically recovered ring 1
>> May 19 16:28:04 corosync[6991]:  [TOTEM ] Marking ringid 1 interface
>> 10.2.2.1 FAULTY
>> May 19 16:28:05 corosync[6991]:  [TOTEM ] Automatically recovered ring 1
>> May 19 18:35:04 corosync[6991]:  [TOTEM ] Marking ringid 0 interface
>> 172.20.16.1 FAULTY
>> May 19 18:35:05 corosync[6991]:  [TOTEM ] Automatically recovered ring 0
>> [...the game continues...]
>>
>> Regards,
>> Ulrich
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx 
>> http://lists.corosync.org/mailman/listinfo/discuss 
>>




_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux