Re: 14.2.9 MDS Failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also seeing errors such as this:


[2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:20,974][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:20,998][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:21,019][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.9 with osd_fsid
32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:25,972][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:25,994][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:26,020][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:26,040][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:26,388][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.1 with osd_fsid
0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:26,389][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:26,391][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,411][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.9 with osd_fsid
32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:31,408][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:31,409][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:31,429][ceph_volume.process][INFO  ] Running command:
/usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:31,743][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:31,750][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries
left: 9
[2020-05-01 13:15:31,754][ceph_volume.process][INFO  ] stderr -->
 RuntimeError: could not find osd.1 with osd_fsid
0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries
left: 9
[2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries
left: 9

On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> wrote:

> Hi Ashley,
>
> Thanks for your response.  Nothing that I can think of would have
> happened.  We are using max_mds =1.  We do have 4 so used to have 3
> standby.  Within minutes they all crash.
>
> On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore@xxxxxxxxxxxxxx>
> wrote:
>
>> Quickly checking the code that calls that assert
>>
>>
>>
>>
>> if (version > omap_version) {
>>
>> omap_version = version;
>>
>> omap_num_objs = num_objs;
>>
>> omap_num_items.resize(omap_num_objs);
>>
>> journal_state = jstate;
>>
>> } else if (version == omap_version) {
>>
>> ceph_assert(omap_num_objs == num_objs);
>>
>> if (jstate > journal_state)
>>
>> journal_state = jstate;
>>
>> }
>>
>> }
>>
>>
>> Im not a dev, but not sure if this will help, seems could mean that MDS
>> thinks its behind on omaps/too far ahead.
>>
>>
>> Anything happened recently? Just running a single MDS?
>>
>>
>> Hopefully someone else may see this and shine some light on what could be
>> causing it.
>>
>>
>>
>> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo@xxxxxxxxx wrote ----
>>
>>
>> Hello,
>>
>> Hoping you can help me.
>>
>> Ceph had been largely problem free for us for the better part of a year.
>> We have a high file count in a single CephFS filesystem, and are seeing
>> this error in the logs:
>>
>>
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
>> 777: FAILED ceph_assert(omap_num_objs == num_objs)
>>
>> The issued seemed to occur this morning, and restarting the MDS as well as
>> rebooting the servers doesn't correct the problem.
>>
>> Not really sure where to look next as the MDS daemons crash.
>>
>> Appreciate any help you can provide
>>
>> Marco
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux