About in_seq, out_seq in Messenger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Recently we enable a async messenger test job in test
lab(http://pulpito.ceph.com/sage-2015-02-03_01:15:10-rados-master-distro-basic-multi/#).
We hit many failed assert mostly are:
              assert(0 == "old msgs despite reconnect_seq feature");

And assert connection all are cluster messenger which mean it's OSD
internal connection. The policy associated this connection is
Messenger::Policy::lossless_peer.

So when I dive into this problem, I find something confusing about
this. Suppose these steps:
1. "lossless_peer" policy is used by both two side connections.
2. markdown one side(anyway), peer connection will try to reconnect
3. then we restart failed side, a new connection is built but
initiator will think it's a old connection so sending in_seq(10)
4. new started connection has no message in queue and it will receive
peer connection's in_seq(10) and call discard_requeued_up_to(10). But
because no message in queue, it won't modify anything
5. now any side issue a message, it will trigger "assert(0 == "old
msgs despite reconnect_seq feature");"

I can replay these steps in unittest and actually it's hit in test lab
for async messenger which follows simple messenger's design.

Besides, if we enable reset_check here, "was_session_reset" will be
called and it will random out_seq, so it will certainly hit "assert(0
== "skipped incoming seq")".

Anything wrong above?

-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux