On Thu, May 8, 2014 at 12:12 AM, Guang <yguang11@xxxxxxxxx> wrote: > Thanks Sage and Greg! > > I just capture a crash from which I get the logs from both side (sadly we > still use debug_ms=0 as increasing the debug_ms level dramatically consume > more disk spaces), from the log, I think I get some more clues how this > failure happened. > > Let us say we have two messengers C and S, at C side, it did a rebind and > cleared everything. > The logs at C side are: > ———— > -6> 2014-05-07 12:29:35.362448 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). got newly_acked_seq 59 vs out_seq 0 > -5> 2014-05-07 12:29:35.362487 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). discarding previously sent 0 osd_map(36192..36196 src has > 35392..36196) v3 > -4> 2014-05-07 12:29:35.362504 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). discarding previously sent 0 pg_notify(4.4d21(21) epoch 36196) v4 > -3> 2014-05-07 12:29:35.362530 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). discarding previously sent 0 pg_notify(3.4d22(21) epoch 36196) v4 > -2> 2014-05-07 12:29:35.362561 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). discarding previously sent 0 pg_query(4.3279 epoch 36196) v2 > -1> 2014-05-07 12:29:35.362571 7fcc26e8c700 2 -- > 10.193.207.191:6918/24089488 >> 10.193.207.182:6982/57027253 pipe(0x19cfec80 > sd=139 :48953 s=1 pgs=0 cs=0 l=0 c=0x13d62c > 60). discarding previously sent 0 pg_query(3.327a epoch 36196) v2 > 0> 2014-05-07 12:29:35.363457 7fcc26e8c700 -1 msg/Pipe.cc: In function > 'int Pipe::connect()' thread 7fcc26e8c700 time 2014-05-07 12:29:35.362587 > msg/Pipe.cc: 1045: FAILED assert(m) > ———— > The logs at S side are: > ——— > 2014-05-07 12:29:09.820735 7eff50794700 0 -- 10.193.207.182:6982/57027253 >>> 10.193.207.191:6918/24089488 pipe(0x170bf680 sd=63 :57524 s=2 pgs=24805 > cs=1 l=0 c=0x1691b340).fault with nothing to send, going to standby > 2014-05-07 12:29:35.361030 7eff40042700 10 osd.363 36196 new session > 0x17d33500 con=0x17fdf080 addr=10.193.207.191:6918/24089488 > 2014-05-07 12:29:35.361068 7eff40042700 0 -- 10.193.207.182:6982/57027253 >>> 10.193.207.191:6918/24089488 pipe(0xfe07400 sd=343 :6982 s=0 pgs=0 cs=0 > l=0 c=0x17fdf080).accept > connect_seq 0 vs existing 1 state standby > 2014-05-07 12:29:35.361078 7eff40042700 0 -- 10.193.207.182:6982/57027253 >>> 10.193.207.191:6918/24089488 pipe(0xfe07400 sd=343 :6982 s=0 pgs=0 cs=0 > l=0 c=0x17fdf080).accept > peer reset, then tried to connect to us, replacing > ——— > And it looks like the below round trips happened between the two sides. > 1. C tried to establish a new connection (when it gets some message to > send) with S with cs=0 (and a new larger pgs) > 2. At S side, there was an existing connection with C (cs=1, pgs=24805, > in_seq>59). > 3. This is what happened at S side: > 3.1 lookup pipe and find an existing one - > https://github.com/ceph/ceph/blob/master/src/msg/Pipe.cc#L445 > 3.2 the execution flow comes to replacing the existing pipe - > https://github.com/ceph/ceph/blob/master/src/msg/Pipe.cc#L473 > 3.3 as this is a lossy connection, the original pipe would be cleared. > 3.4 then it comes to the open section > (https://github.com/ceph/ceph/blob/master/src/msg/Pipe.cc#L649) and echo > back a reply with tag CEPH_MSGR_TAG_SEQ along with the seq from original > pipe > 4. At C side > 4.1 as this is a brand new pipe so that out_seq is zero, however, the above > message comes with along a original newly_ack_seq, this bring the assertion > failure we observed. > > If this makes sense, at step 3.4, we properly should not echo original seq > if this is a branch new connection attempt (cs = 0). > > Does this look like reasonable? Ah! Yes, you're exactly right. This error makes sense; too — the CEPH_MSGR_TAG_SEQ is much newer than the rest of the reconnect logic. On Thu, May 8, 2014 at 4:58 AM, Guang <yguang11@xxxxxxxxx> wrote: > I think the next question is, why C failed to find a PIPE (locally) for S > but there is actually one PIPE existing at S side (with C), is it possible > that C unregister the pipe but S has not been aware yet? Exactly. In the other logs I've seen, C has done a mark_down_all() (which kills every single pipe) while doing a rebind(); generally this is as a consequence of getting an OSDMap which says it's down. If S has not yet seen this map, it won't have nuked the Connection on its side. Would you like to prepare a patch, or should I? :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html