Also seeing errors such as this: [2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries left: 11 [2020-05-01 13:15:20,974][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.13 with osd_fsid dd49cd80-418e-4a8c-8ebf-a33d339663ff [2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries left: 11 [2020-05-01 13:15:20,998][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.5 with osd_fsid 4eaf2baa-60f2-4045-8964-6152608c742a [2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries left: 11 [2020-05-01 13:15:21,019][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.9 with osd_fsid 32f4a716-f26e-4579-a074-5d6452c22e34 [2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries left: 11 [2020-05-01 13:15:25,972][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 [2020-05-01 13:15:25,994][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff [2020-05-01 13:15:26,020][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a [2020-05-01 13:15:26,040][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34 [2020-05-01 13:15:26,388][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.1 with osd_fsid 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 [2020-05-01 13:15:26,389][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.13 with osd_fsid dd49cd80-418e-4a8c-8ebf-a33d339663ff [2020-05-01 13:15:26,391][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.5 with osd_fsid 4eaf2baa-60f2-4045-8964-6152608c742a [2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries left: 10 [2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries left: 10 [2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries left: 10 [2020-05-01 13:15:26,411][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.9 with osd_fsid 32f4a716-f26e-4579-a074-5d6452c22e34 [2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries left: 10 [2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 [2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a [2020-05-01 13:15:31,409][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff [2020-05-01 13:15:31,429][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34 [2020-05-01 13:15:31,743][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.5 with osd_fsid 4eaf2baa-60f2-4045-8964-6152608c742a [2020-05-01 13:15:31,750][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.13 with osd_fsid dd49cd80-418e-4a8c-8ebf-a33d339663ff [2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries left: 9 [2020-05-01 13:15:31,754][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.1 with osd_fsid 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 [2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries left: 9 [2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit status: 1 [2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries left: 9 On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> wrote: > Hi Ashley, > > Thanks for your response. Nothing that I can think of would have > happened. We are using max_mds =1. We do have 4 so used to have 3 > standby. Within minutes they all crash. > > On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx> > wrote: > >> Quickly checking the code that calls that assert >> >> >> >> >> if (version > omap_version) { >> >> omap_version = version; >> >> omap_num_objs = num_objs; >> >> omap_num_items.resize(omap_num_objs); >> >> journal_state = jstate; >> >> } else if (version == omap_version) { >> >> ceph_assert(omap_num_objs == num_objs); >> >> if (jstate > journal_state) >> >> journal_state = jstate; >> >> } >> >> } >> >> >> Im not a dev, but not sure if this will help, seems could mean that MDS >> thinks its behind on omaps/too far ahead. >> >> >> Anything happened recently? Just running a single MDS? >> >> >> Hopefully someone else may see this and shine some light on what could be >> causing it. >> >> >> >> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo@xxxxxxxxx wrote ---- >> >> >> Hello, >> >> Hoping you can help me. >> >> Ceph had been largely problem free for us for the better part of a year. >> We have a high file count in a single CephFS filesystem, and are seeing >> this error in the logs: >> >> >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc: >> 777: FAILED ceph_assert(omap_num_objs == num_objs) >> >> The issued seemed to occur this morning, and restarting the MDS as well as >> rebooting the servers doesn't correct the problem. >> >> Not really sure where to look next as the MDS daemons crash. >> >> Appreciate any help you can provide >> >> Marco >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx