I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers have started failing with the error message In function 'void Server::handle_client_open(MDRequestRef&)' thread 7f0ca9908700 time 2021-06-28T09:21:11.484768+0200 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/mds/Server.cc: 4149: FAILED ceph_assert(cur->is_auth()) Complete log is: https://gist.github.com/pvanheus/4da555a6de6b5fa5e46cbf74f5500fbd ceph status output is: # ceph status cluster: id: ed7b2c16-b053-45e2-a1fe-bf3474f90508 health: HEALTH_WARN 30 OSD(s) experiencing BlueFS spillover insufficient standby MDS daemons available 1 MDSs report slow requests 2 mgr modules have failed dependencies 4347046/326505282 objects misplaced (1.331%) 6 nearfull osd(s) 23 pgs not deep-scrubbed in time 23 pgs not scrubbed in time 8 pool(s) nearfull services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 22m) mgr: ceph-mon1(active, since 11w), standbys: ceph-mon2, ceph-mon3 mds: SANBI_FS:2 {0=ceph-mon1=up:active(laggy or crashed),1=ceph-mon2=up:stopping} osd: 54 osds: 54 up (since 2w), 54 in (since 11w); 50 remapped pgs data: pools: 8 pools, 833 pgs objects: 42.37M objects, 89 TiB usage: 159 TiB used, 105 TiB / 264 TiB avail pgs: 4347046/326505282 objects misplaced (1.331%) 782 active+clean 49 active+clean+remapped 1 active+clean+scrubbing+deep 1 active+clean+remapped+scrubbing io: client: 29 KiB/s rd, 427 KiB/s wr, 37 op/s rd, 48 op/s wr When restarting a MDS it goes through states replace, reconnect, resolve and finally sets itself to active before this crash happens. Any advice on what to do? Thanks, Peter P.S. apologies if you received this email more than once - I have had some trouble figuring out the correct mailing list to use. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx