The OpenFileTable objects are safe to delete while the MDS is offline anyways, the RADOS object names are mds*_openfiles* Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> wrote: > Also seeing errors such as this: > > > [2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries > left: 11 > [2020-05-01 13:15:20,974][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.13 with osd_fsid > dd49cd80-418e-4a8c-8ebf-a33d339663ff > [2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries > left: 11 > [2020-05-01 13:15:20,998][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.5 with osd_fsid > 4eaf2baa-60f2-4045-8964-6152608c742a > [2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries > left: 11 > [2020-05-01 13:15:21,019][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.9 with osd_fsid > 32f4a716-f26e-4579-a074-5d6452c22e34 > [2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries > left: 11 > [2020-05-01 13:15:25,972][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 > [2020-05-01 13:15:25,994][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff > [2020-05-01 13:15:26,020][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a > [2020-05-01 13:15:26,040][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34 > [2020-05-01 13:15:26,388][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.1 with osd_fsid > 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 > [2020-05-01 13:15:26,389][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.13 with osd_fsid > dd49cd80-418e-4a8c-8ebf-a33d339663ff > [2020-05-01 13:15:26,391][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.5 with osd_fsid > 4eaf2baa-60f2-4045-8964-6152608c742a > [2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries > left: 10 > [2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries > left: 10 > [2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries > left: 10 > [2020-05-01 13:15:26,411][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.9 with osd_fsid > 32f4a716-f26e-4579-a074-5d6452c22e34 > [2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries > left: 10 > [2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 > [2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a > [2020-05-01 13:15:31,409][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff > [2020-05-01 13:15:31,429][ceph_volume.process][INFO ] Running command: > /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34 > [2020-05-01 13:15:31,743][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.5 with osd_fsid > 4eaf2baa-60f2-4045-8964-6152608c742a > [2020-05-01 13:15:31,750][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.13 with osd_fsid > dd49cd80-418e-4a8c-8ebf-a33d339663ff > [2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries > left: 9 > [2020-05-01 13:15:31,754][ceph_volume.process][INFO ] stderr --> > RuntimeError: could not find osd.1 with osd_fsid > 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32 > [2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries > left: 9 > [2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit > status: 1 > [2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries > left: 9 > > On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> > wrote: > > > Hi Ashley, > > > > Thanks for your response. Nothing that I can think of would have > > happened. We are using max_mds =1. We do have 4 so used to have 3 > > standby. Within minutes they all crash. > > > > On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx> > > wrote: > > > >> Quickly checking the code that calls that assert > >> > >> > >> > >> > >> if (version > omap_version) { > >> > >> omap_version = version; > >> > >> omap_num_objs = num_objs; > >> > >> omap_num_items.resize(omap_num_objs); > >> > >> journal_state = jstate; > >> > >> } else if (version == omap_version) { > >> > >> ceph_assert(omap_num_objs == num_objs); > >> > >> if (jstate > journal_state) > >> > >> journal_state = jstate; > >> > >> } > >> > >> } > >> > >> > >> Im not a dev, but not sure if this will help, seems could mean that MDS > >> thinks its behind on omaps/too far ahead. > >> > >> > >> Anything happened recently? Just running a single MDS? > >> > >> > >> Hopefully someone else may see this and shine some light on what could > be > >> causing it. > >> > >> > >> > >> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo@xxxxxxxxx wrote > ---- > >> > >> > >> Hello, > >> > >> Hoping you can help me. > >> > >> Ceph had been largely problem free for us for the better part of a year. > >> We have a high file count in a single CephFS filesystem, and are seeing > >> this error in the logs: > >> > >> > >> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc: > >> 777: FAILED ceph_assert(omap_num_objs == num_objs) > >> > >> The issued seemed to occur this morning, and restarting the MDS as well > as > >> rebooting the servers doesn't correct the problem. > >> > >> Not really sure where to look next as the MDS daemons crash. > >> > >> Appreciate any help you can provide > >> > >> Marco > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx