Thanks, i got it. but still puzzled where the `session_security` could be reset from the code... 2017-09-06 10:57 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>: > On Wed, 6 Sep 2017, zengran zhang wrote: >> Thanks Grge, we used test on the old version, and have plan to update >> already...about the assert, i puzzled how could the session_key could >> become empty... the connection had just finish `verify_authorizer` , >> in which we(accepter) got the session_key from peer's authorizer_bl >> and encrypt auth_reply with the session_key... then how could the >> session_key missing when we `check_message_signature` on read? > > It is possible we are racing with the authentication handshake? I > seem to remember there is race possibility if we are just setting up > the auth (in the monclient auth handshake) and then get an incoming > connection. This was previously popping up because of a messenger > change that allowed incoming connections even when the nonce the peer > wanted wasn't us; we revert it and crashes like this went away. See > bf4938567943c80345966f9c5a3bdc75a913175b and the revert of that bit in > 8e6a15d01ae5f36169b3ba9f53d81a9afcc9d355. Could this be what you are > seeing? > > sage > > > >> >> 2017-09-06 1:55 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>: >> > On Tue, Sep 5, 2017 at 7:19 AM, zengran zhang <z13121369189@xxxxxxxxx> wrote: >> >> Hi, >> >> >> >> in our cluster, we could found some rare and odd assert log, which >> >> says that the session key was emtpy. the log happen only once dur some >> >> days, and only happen when osd just boot up... >> >> is it possible that the `CephxAuthorizeHandler::verify_authorizer` >> >> return valid but the `session_key` was null ? log as follow: >> >> >> >> 2017-09-05 10:11:00.954356 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).process prev state >> >> is STATE_ACCEPTING_WAIT_BANNER_ADDR >> >> 2017-09-05 10:11:00.954361 7f898bc3c700 10 Event(0x55c4fff222c0 >> >> nevent=5000 time_id=1).process_events wait second 30 usec 0 >> >> 2017-09-05 10:11:00.954363 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_until len is >> >> 33 state_offset is 0 >> >> 2017-09-05 10:11:00.954372 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_until >> >> read_bulk recv_end is 0 left is 33 got 207 >> >> 2017-09-05 10:11:00.954466 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).process prev >> >> state is STATE_ACCEPTING_WAIT_CONNECT_MSG >> >> 2017-09-05 10:11:00.954488 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).read_until len >> >> is 174 state_offset is 0 >> >> 2017-09-05 10:11:00.954495 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).read_until got >> >> 174 in buffer left is 0 buffer still has 0 >> >> 2017-09-05 10:11:00.954502 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 >> >> l=0)._process_connection accept got peer connect_seq 0 global_seq 7 >> >> 2017-09-05 10:11:00.954509 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 >> >> l=1)._process_connection accept of host_type 4, policy.lossy=1 >> >> policy.server=1 policy.standby=0 policy.resetcheck=0 >> >> 2017-09-05 10:11:00.954525 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 >> >> l=1).handle_connect_msg accept my proto 10, their proto 10 >> >> 2017-09-05 10:11:00.954535 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 >> >> l=1).handle_connect_msg accept setting up session_security. >> >> 2017-09-05 10:11:00.954542 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 >> >> l=1).handle_connect_msg accept new session >> >> 2017-09-05 10:11:00.954547 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=7 cs=1 >> >> l=1).handle_connect_msg accept success, connect_seq = 1 in_seq=0, >> >> sending READY >> >> 2017-09-05 10:11:00.954572 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=7 cs=1 >> >> l=1).handle_connect_msg accept features 1152323339925389307 >> >> 2017-09-05 10:11:00.954632 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=7 cs=1 l=1)._try_send sent >> >> bytes 34 remaining bytes 0 >> >> 2017-09-05 10:11:00.954667 7f898c43d700 2 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1).handle_connect_msg accept >> >> write reply msg done >> >> 2017-09-05 10:11:00.954682 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1).process prev state is >> >> STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH >> >> 2017-09-05 10:11:00.954687 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1).read_until len is 8 >> >> state_offset is 0 >> >> 2017-09-05 10:11:00.954708 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1).read_until read_bulk >> >> recv_end is 0 left is 8 got 8 >> >> 2017-09-05 10:11:00.954724 7f898c43d700 2 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1)._process_connection accept >> >> get newly_acked_seq 0 >> >> 2017-09-05 10:11:00.954730 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_WAIT_SEQ pgs=7 cs=1 l=1).discard_requeued_up_to 0 >> >> 2017-09-05 10:11:00.954736 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_READY pgs=7 cs=1 l=1).process prev state is >> >> STATE_ACCEPTING_WAIT_SEQ >> >> 2017-09-05 10:11:00.954741 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_ACCEPTING_READY pgs=7 cs=1 l=1)._process_connection accept >> >> done >> >> 2017-09-05 10:11:00.954746 7f898c43d700 10 Event(0x55c4fff22080 >> >> nevent=5000 time_id=2).create_time_event id=1 trigger after >> >> 900000000us >> >> 2017-09-05 10:11:00.954759 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).process prev state is STATE_ACCEPTING_READY >> >> 2017-09-05 10:11:00.954765 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).read_until len is 1 state_offset is 0 >> >> 2017-09-05 10:11:00.954772 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).read_until read_bulk recv_end is 0 left is 1 got 0 >> >> 2017-09-05 10:11:00.954777 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).read_until need len 1 remaining 1 bytes >> >> 2017-09-05 10:11:00.954784 7f898c43d700 30 stack operator() calling >> >> event process >> >> 2017-09-05 10:11:00.954786 7f898c43d700 10 Event(0x55c4fff22080 >> >> nevent=5000 time_id=2).process_events wait second 30 usec 0 >> >> 2017-09-05 10:11:00.955445 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).process prev state is STATE_OPEN >> >> 2017-09-05 10:11:00.955462 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).read_until len is 1 state_offset is 0 >> >> 2017-09-05 10:11:00.955472 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 s=STATE_OPEN pgs=7 >> >> cs=1 l=1).read_until read_bulk recv_end is 0 left is 1 got 4096 >> >> 2017-09-05 10:11:00.955478 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).process prev state is >> >> STATE_OPEN >> >> 2017-09-05 10:11:00.955483 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).process begin MSG >> >> 2017-09-05 10:11:00.955488 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).read_until len is 53 >> >> state_offset is 0 >> >> 2017-09-05 10:11:00.955494 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).read_until got 53 in >> >> buffer left is 0 buffer still has 4042 >> >> 2017-09-05 10:11:00.955499 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).process got MSG header >> >> 2017-09-05 10:11:00.955505 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_HEADER pgs=7 cs=1 l=1).process got envelope >> >> type=80 src osd.3 front=111623 data=0 off 0 >> >> 2017-09-05 10:11:00.955522 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=7 cs=1 l=1).process prev >> >> state is STATE_OPEN_MESSAGE_HEADER >> >> 2017-09-05 10:11:00.955528 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=7 cs=1 l=1).process prev state >> >> is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE >> >> 2017-09-05 10:11:00.955533 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=7 cs=1 l=1).process >> >> prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES >> >> 2017-09-05 10:11:00.955540 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FRONT pgs=7 cs=1 l=1).process prev state is >> >> STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE >> >> 2017-09-05 10:11:00.955552 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FRONT pgs=7 cs=1 l=1).read_until len is >> >> 111623 state_offset is 0 >> >> 2017-09-05 10:11:00.955558 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FRONT pgs=7 cs=1 l=1).read_until got 4042 in >> >> buffer left is 107581 buffer still has 0 >> >> 2017-09-05 10:11:00.955679 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FRONT pgs=7 cs=1 l=1).read_until read_bulk >> >> left is 107581 got 107581 >> >> 2017-09-05 10:11:00.955696 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FRONT pgs=7 cs=1 l=1).process got front >> >> 111623 >> >> 2017-09-05 10:11:00.955702 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 >> >> l=1).read_until len is 21 state_offset is 0 >> >> 2017-09-05 10:11:00.955711 7f898c43d700 25 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 >> >> l=1).read_until read_bulk recv_end is 0 left is 21 got 21 >> >> 2017-09-05 10:11:00.955717 7f898c43d700 10 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=1).process >> >> aborted = 0 >> >> 2017-09-05 10:11:00.955722 7f898c43d700 20 -- 192.168.2.1:6805/7391 >> >> >> 192.168.2.1:6811/7386 conn(0x55c5053b1800 :6805 >> >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7 cs=1 l=1).process >> >> got 111623 + 0 + 0 byte message >> >> 2017-09-05 10:11:00.965585 7f898c43d700 -1 >> >> /tmp/release/Ubuntu/WORKDIR/ceph-12.0.2-32-ge885403/src/auth/Crypto.h: >> >> In function 'int CryptoKey::encrypt(CephContext*, const bufferlist&, >> >> ceph::bufferlist&, std::__cxx11::string*) const' thread 7f898c43d700 >> >> time 2017-09-05 10:11:00.957267 >> >> /tmp/release/Ubuntu/WORKDIR/ceph-12.0.2-32-ge885403/src/auth/Crypto.h: >> >> 109: FAILED assert(ckh) >> > >> > There were several bugs in this area during the Luminous development >> > cycle that have been fixed for final release. Since it looks like >> > you're using a *very* old development release, you should update! :) >> > >> > (Why were you running that on a cluster anyway? I hope it's just for >> > development purposes!) >> > -Greg >> > >> >> >> >> ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e) >> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >> const*)+0x102) [0x55c4f5cd84d2] >> >> 2: (CephxSessionHandler::_calc_signature(Message*, unsigned >> >> long*)+0x5d2) [0x55c4f5ef7932] >> >> 3: (CephxSessionHandler::check_message_signature(Message*)+0x7d) >> >> [0x55c4f5ef7cfd] >> >> 4: (AsyncConnection::process()+0x1dbc) [0x55c4f5edf82c] >> >> 5: (EventCenter::process_events(int)+0x9b1) [0x55c4f5d70f71] >> >> 6: (()+0xdb5e41) [0x55c4f5d74e41] >> >> 7: (()+0xb8c80) [0x7f898f0c2c80] >> >> 8: (()+0x76ba) [0x7f898f7b16ba] >> >> 9: (clone()+0x6d) [0x7f898e82882d] >> >> >> >> Thanks & Regards >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html