MDS crash on FAILED ceph_assert(cur->is_auth())

Peter van Heusden <pvh@xxxxxxxxxxx> · Mon, 28 Jun 2021 10:52:20 +0200

I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers
have started failing with the error message

In function 'void Server::handle_client_open(MDRequestRef&)' thread
7f0ca9908700 time 2021-06-28T09:21:11.484768+0200
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/mds/Server.cc:
4149: FAILED ceph_assert(cur->is_auth())

Complete log is:
https://gist.github.com/pvanheus/4da555a6de6b5fa5e46cbf74f5500fbd

ceph status output is:

# ceph status
  cluster:
    id:     ed7b2c16-b053-45e2-a1fe-bf3474f90508
    health: HEALTH_WARN
            30 OSD(s) experiencing BlueFS spillover
            insufficient standby MDS daemons available
            1 MDSs report slow requests
            2 mgr modules have failed dependencies
            4347046/326505282 objects misplaced (1.331%)
            6 nearfull osd(s)
            23 pgs not deep-scrubbed in time
            23 pgs not scrubbed in time
            8 pool(s) nearfull

  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 22m)
    mgr: ceph-mon1(active, since 11w), standbys: ceph-mon2, ceph-mon3
    mds: SANBI_FS:2 {0=ceph-mon1=up:active(laggy or
crashed),1=ceph-mon2=up:stopping}
    osd: 54 osds: 54 up (since 2w), 54 in (since 11w); 50 remapped pgs

  data:
    pools:   8 pools, 833 pgs
    objects: 42.37M objects, 89 TiB
    usage:   159 TiB used, 105 TiB / 264 TiB avail
    pgs:     4347046/326505282 objects misplaced (1.331%)
             782 active+clean
             49  active+clean+remapped
             1   active+clean+scrubbing+deep
             1   active+clean+remapped+scrubbing

  io:
    client:   29 KiB/s rd, 427 KiB/s wr, 37 op/s rd, 48 op/s wr

When restarting a MDS it goes through states replace, reconnect, resolve
and finally sets itself to active before this crash happens.

Any advice on what to do?

Thanks,
Peter
P.S. apologies if you received this email more than once - I have had some
trouble figuring out the correct mailing list to use.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx