Hello Stefan, On Thu, Feb 13, 2020 at 9:19 AM Stefan Kooman <stefan@xxxxxx> wrote: > > Hi, > > We hit the following assert: > > -10001> 2020-02-13 17:42:35.543 7f11b5669700 -1 /build/ceph-13.2.8/src/mds/MDCache.cc: In function 'MDRequestRef MDCa > che::request_get(metareqid_t)' thread 7f11b5669700 time 2020-02-13 17:42:35.545815 > /build/ceph-13.2.8/src/mds/MDCache.cc: 9523: FAILED assert(p != active_requests.end()) > > ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f11bd8e69de] > 2: (()+0x287b67) [0x7f11bd8e6b67] > 3: (MDCache::request_get(metareqid_t)+0x94) [0x560cde8bb214] > 4: (Server::journal_close_session(Session*, int, Context*)+0x9dd) [0x560cde829d1d] > 5: (Server::handle_client_session(MClientSession*)+0x1071) [0x560cde82b0f1] > 6: (Server::dispatch(Message*)+0x30b) [0x560cde86f87b] > 7: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x560cde7e1664] > 8: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x560cde7f8c7b] > 9: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x560cde7f92e3] > 10: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x560cde7d92b3] > 11: (DispatchQueue::entry()+0xb92) [0x7f11bd9a9e52] > 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f11bda46e2d] > 13: (()+0x76db) [0x7f11bd1d76db] > 14: (clone()+0x3f) [0x7f11bc3bd88f] > > Before we hit this assert there were a few (kernel clients, 5.3.0-26/28) > that were not playing nicely: > > 16:32 < bitrot> mds.mds1 [WRN] client.61994841 isn't responding to mclientcaps(revoke), ino 0x1003846ddc5 pending > pAsLsXsFscr issued pAsLsXsFscr, sent 62.342791 seconds ago > 16:32 < bitrot> mon.mon1 [WRN] Health check failed: 1 clients failing to respond to capability release > (MDS_CLIENT_LATE_RELEASE) > > We rebooted both clients. After that one of them again had some slow > requests. We umounted the file system, slowly after that the MDS hit the > assert. Failover went fine this time. > > This looks like issue: https://tracker.ceph.com/issues/23059 ... but > that should already have been resolved. Is this the same issue, and or a > regression? > > We run 13.2.8. Thanks for the information. It looks like this bug: https://tracker.ceph.com/issues/42467#note-7 Do you have logs you can share? You can use ceph-post-file [1] to share. [1] https://docs.ceph.com/docs/master/man/8/ceph-post-file/ -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx