Ceph MDS ASSERT In function 'MDRequestRef'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We hit the following assert:

-10001> 2020-02-13 17:42:35.543 7f11b5669700 -1 /build/ceph-13.2.8/src/mds/MDCache.cc: In function 'MDRequestRef MDCa
che::request_get(metareqid_t)' thread 7f11b5669700 time 2020-02-13 17:42:35.545815
/build/ceph-13.2.8/src/mds/MDCache.cc: 9523: FAILED assert(p != active_requests.end())

 ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f11bd8e69de]
 2: (()+0x287b67) [0x7f11bd8e6b67]
 3: (MDCache::request_get(metareqid_t)+0x94) [0x560cde8bb214]
 4: (Server::journal_close_session(Session*, int, Context*)+0x9dd) [0x560cde829d1d]
 5: (Server::handle_client_session(MClientSession*)+0x1071) [0x560cde82b0f1]
 6: (Server::dispatch(Message*)+0x30b) [0x560cde86f87b]
 7: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x560cde7e1664]
 8: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x560cde7f8c7b]
 9: (MDSRankDispatcher::ms_dispatch(Message*)+0xa3) [0x560cde7f92e3]
 10: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x560cde7d92b3]
 11: (DispatchQueue::entry()+0xb92) [0x7f11bd9a9e52]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f11bda46e2d]
 13: (()+0x76db) [0x7f11bd1d76db]
 14: (clone()+0x3f) [0x7f11bc3bd88f]

Before we hit this assert there were a few (kernel clients, 5.3.0-26/28)
that were not playing nicely:

16:32 < bitrot> mds.mds1 [WRN] client.61994841 isn't responding to mclientcaps(revoke), ino 0x1003846ddc5 pending 
                pAsLsXsFscr issued pAsLsXsFscr, sent 62.342791 seconds ago
16:32 < bitrot> mon.mon1 [WRN] Health check failed: 1 clients failing to respond to capability release 
                (MDS_CLIENT_LATE_RELEASE)

We rebooted both clients. After that one of them again had some slow
requests. We umounted the file system, slowly after that the MDS hit the
assert. Failover went fine this time.

This looks like issue: https://tracker.ceph.com/issues/23059 ... but
that should already have been resolved. Is this the same issue, and or a
regression?

We run 13.2.8.

Thanks,

Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux