Re: 14.2.9 MDS Failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

I appreciate the response but as I'm fairly new to Ceph, I am not sure that
I'm understanding.

Are you saying that you believe the issue to be due to the number of open
files?  If so, what are you suggesting as the solution?

Thanks.



On Fri, May 1, 2020 at 3:27 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:

> The OpenFileTable objects are safe to delete while the MDS is offline
> anyways, the RADOS object names are mds*_openfiles*
>
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx>
> wrote:
>
>> Also seeing errors such as this:
>>
>>
>> [2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries
>> left: 11
>> [2020-05-01 13:15:20,974][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.13 with osd_fsid
>> dd49cd80-418e-4a8c-8ebf-a33d339663ff
>> [2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries
>> left: 11
>> [2020-05-01 13:15:20,998][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.5 with osd_fsid
>> 4eaf2baa-60f2-4045-8964-6152608c742a
>> [2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries
>> left: 11
>> [2020-05-01 13:15:21,019][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.9 with osd_fsid
>> 32f4a716-f26e-4579-a074-5d6452c22e34
>> [2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries
>> left: 11
>> [2020-05-01 13:15:25,972][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
>> [2020-05-01 13:15:25,994][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
>> [2020-05-01 13:15:26,020][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
>> [2020-05-01 13:15:26,040][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
>> [2020-05-01 13:15:26,388][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.1 with osd_fsid
>> 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
>> [2020-05-01 13:15:26,389][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.13 with osd_fsid
>> dd49cd80-418e-4a8c-8ebf-a33d339663ff
>> [2020-05-01 13:15:26,391][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.5 with osd_fsid
>> 4eaf2baa-60f2-4045-8964-6152608c742a
>> [2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries
>> left: 10
>> [2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries
>> left: 10
>> [2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries
>> left: 10
>> [2020-05-01 13:15:26,411][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.9 with osd_fsid
>> 32f4a716-f26e-4579-a074-5d6452c22e34
>> [2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries
>> left: 10
>> [2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
>> [2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
>> [2020-05-01 13:15:31,409][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
>> [2020-05-01 13:15:31,429][ceph_volume.process][INFO  ] Running command:
>> /usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
>> [2020-05-01 13:15:31,743][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.5 with osd_fsid
>> 4eaf2baa-60f2-4045-8964-6152608c742a
>> [2020-05-01 13:15:31,750][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.13 with osd_fsid
>> dd49cd80-418e-4a8c-8ebf-a33d339663ff
>> [2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries
>> left: 9
>> [2020-05-01 13:15:31,754][ceph_volume.process][INFO  ] stderr -->
>>  RuntimeError: could not find osd.1 with osd_fsid
>> 0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
>> [2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries
>> left: 9
>> [2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit
>> status: 1
>> [2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries
>> left: 9
>>
>> On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx>
>> wrote:
>>
>> > Hi Ashley,
>> >
>> > Thanks for your response.  Nothing that I can think of would have
>> > happened.  We are using max_mds =1.  We do have 4 so used to have 3
>> > standby.  Within minutes they all crash.
>> >
>> > On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx
>> >
>> > wrote:
>> >
>> >> Quickly checking the code that calls that assert
>> >>
>> >>
>> >>
>> >>
>> >> if (version > omap_version) {
>> >>
>> >> omap_version = version;
>> >>
>> >> omap_num_objs = num_objs;
>> >>
>> >> omap_num_items.resize(omap_num_objs);
>> >>
>> >> journal_state = jstate;
>> >>
>> >> } else if (version == omap_version) {
>> >>
>> >> ceph_assert(omap_num_objs == num_objs);
>> >>
>> >> if (jstate > journal_state)
>> >>
>> >> journal_state = jstate;
>> >>
>> >> }
>> >>
>> >> }
>> >>
>> >>
>> >> Im not a dev, but not sure if this will help, seems could mean that MDS
>> >> thinks its behind on omaps/too far ahead.
>> >>
>> >>
>> >> Anything happened recently? Just running a single MDS?
>> >>
>> >>
>> >> Hopefully someone else may see this and shine some light on what could
>> be
>> >> causing it.
>> >>
>> >>
>> >>
>> >> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo@xxxxxxxxx wrote
>> ----
>> >>
>> >>
>> >> Hello,
>> >>
>> >> Hoping you can help me.
>> >>
>> >> Ceph had been largely problem free for us for the better part of a
>> year.
>> >> We have a high file count in a single CephFS filesystem, and are seeing
>> >> this error in the logs:
>> >>
>> >>
>> >>
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
>> >> 777: FAILED ceph_assert(omap_num_objs == num_objs)
>> >>
>> >> The issued seemed to occur this morning, and restarting the MDS as
>> well as
>> >> rebooting the servers doesn't correct the problem.
>> >>
>> >> Not really sure where to look next as the MDS daemons crash.
>> >>
>> >> Appreciate any help you can provide
>> >>
>> >> Marco
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux