Re: 14.2.9 MDS Failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The OpenFileTable objects are safe to delete while the MDS is offline
anyways, the RADOS object names are mds*_openfiles*



Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> wrote:

> Also seeing errors such as this:
>
>
> [2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries
> left: 11
> [2020-05-01 13:15:20,974][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.13 with osd_fsid
> dd49cd80-418e-4a8c-8ebf-a33d339663ff
> [2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries
> left: 11
> [2020-05-01 13:15:20,998][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.5 with osd_fsid
> 4eaf2baa-60f2-4045-8964-6152608c742a
> [2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries
> left: 11
> [2020-05-01 13:15:21,019][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.9 with osd_fsid
> 32f4a716-f26e-4579-a074-5d6452c22e34
> [2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries
> left: 11
> [2020-05-01 13:15:25,972][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
> [2020-05-01 13:15:25,994][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
> [2020-05-01 13:15:26,020][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
> [2020-05-01 13:15:26,040][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
> [2020-05-01 13:15:26,388][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.1 with osd_fsid
> 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
> [2020-05-01 13:15:26,389][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.13 with osd_fsid
> dd49cd80-418e-4a8c-8ebf-a33d339663ff
> [2020-05-01 13:15:26,391][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.5 with osd_fsid
> 4eaf2baa-60f2-4045-8964-6152608c742a
> [2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries
> left: 10
> [2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries
> left: 10
> [2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries
> left: 10
> [2020-05-01 13:15:26,411][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.9 with osd_fsid
> 32f4a716-f26e-4579-a074-5d6452c22e34
> [2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries
> left: 10
> [2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
> [2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
> [2020-05-01 13:15:31,409][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
> [2020-05-01 13:15:31,429][ceph_volume.process][INFO  ] Running command:
> /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
> [2020-05-01 13:15:31,743][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.5 with osd_fsid
> 4eaf2baa-60f2-4045-8964-6152608c742a
> [2020-05-01 13:15:31,750][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.13 with osd_fsid
> dd49cd80-418e-4a8c-8ebf-a33d339663ff
> [2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries
> left: 9
> [2020-05-01 13:15:31,754][ceph_volume.process][INFO  ] stderr -->
>  RuntimeError: could not find osd.1 with osd_fsid
> 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
> [2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries
> left: 9
> [2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit
> status: 1
> [2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries
> left: 9
>
> On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx>
> wrote:
>
> > Hi Ashley,
> >
> > Thanks for your response.  Nothing that I can think of would have
> > happened.  We are using max_mds =1.  We do have 4 so used to have 3
> > standby.  Within minutes they all crash.
> >
> > On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx>
> > wrote:
> >
> >> Quickly checking the code that calls that assert
> >>
> >>
> >>
> >>
> >> if (version > omap_version) {
> >>
> >> omap_version = version;
> >>
> >> omap_num_objs = num_objs;
> >>
> >> omap_num_items.resize(omap_num_objs);
> >>
> >> journal_state = jstate;
> >>
> >> } else if (version == omap_version) {
> >>
> >> ceph_assert(omap_num_objs == num_objs);
> >>
> >> if (jstate > journal_state)
> >>
> >> journal_state = jstate;
> >>
> >> }
> >>
> >> }
> >>
> >>
> >> Im not a dev, but not sure if this will help, seems could mean that MDS
> >> thinks its behind on omaps/too far ahead.
> >>
> >>
> >> Anything happened recently? Just running a single MDS?
> >>
> >>
> >> Hopefully someone else may see this and shine some light on what could
> be
> >> causing it.
> >>
> >>
> >>
> >> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo@xxxxxxxxx wrote
> ----
> >>
> >>
> >> Hello,
> >>
> >> Hoping you can help me.
> >>
> >> Ceph had been largely problem free for us for the better part of a year.
> >> We have a high file count in a single CephFS filesystem, and are seeing
> >> this error in the logs:
> >>
> >>
> >>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
> >> 777: FAILED ceph_assert(omap_num_objs == num_objs)
> >>
> >> The issued seemed to occur this morning, and restarting the MDS as well
> as
> >> rebooting the servers doesn't correct the problem.
> >>
> >> Not really sure where to look next as the MDS daemons crash.
> >>
> >> Appreciate any help you can provide
> >>
> >> Marco
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux