On 23/02/15 23:48, Andrew Beekhof wrote: > >> On 24 Feb 2015, at 12:08 am, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote: >> >> On 16/02/15 14:10, Christine Caulfield wrote: >>> On 13/02/15 12:27, Jan Friesse wrote: >>>> Chrissie, >>>> >>>> Christine Caulfield napsal(a): >>>>> It occurs to me that, as this has the potential to break Virtual >>>>> Synchrony, there should be an option to either disable message >>>> >>>> it took me a while until I've found what exactly you mean by break EVS. >>>> This is something we (or at least I) totally forgot but it looks like >>>> HUGE problem and I don't even think that pcmk is able to handle this >>>> situation (I believe you are talking about situation when one node is >>>> sending long message, and other node will leave and then join again into >>>> membership during long message is sent, so it will not receive that >>>> message). >>>> >>>>> fragmentation or to some indication of the maximum message size that >>>>> will not be fragmented. >>>>> >>>>> Thoughts? >>>> >>>> I'm thinking about following solutions: >>>> - implement deferral of delivery of membership change to client >>>> - some kind of recovery... Both of them is like reimplementing totem >>>> inside libcpg. >>>> - Another solution may be to add extra callback parameter and deliver >>>> also list of nodes who received message. >>>> >>>> Generally, I'm really not very happy with breaking EVS. Yes, loooong >>>> messages use case is weird and not so common outside pcmk and yes, >>>> satellite nodes will break EVS anyway, but for needle we should stay >>>> very conservative. >>>> >>> >>> Agreed, I didn't realise how bad it could be until late into the >>> development here. The 'sledgehammer' solution would be to flag when a >>> confchg has happened during the sending of a long message and if that >>> happens, invalidate the whole send. It would then means retransmitting >>> the whole message again from the start. >>> >> >> >> ... and here it is! >> >> I stopped short of checking the ring state when the message is finally >> delivered, it just checks it at each transmission stage. Which is what >> happens with normal sends, of course. > > Rather a small patch in the end, always a good sign :) > > One thing that wasn't completely clear... do the applications need to care about resending or will is the client library taking care of that? > It seems to be the latter right? > If the application gets CS_ERR_INTERRUPT* then it will need to resend the message, it's a bit like EAGAIN in that sense. Chrissie *If anyone has a better choice for a return code or thinks I should invent a new one then please say so _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss