Re: OSD repeatedly marked down

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 1 Dec 2021 18:20:42 +0100

Hi,

You should check the central ceph.log to understand why the osd is
getting marked down to begin with. Is it a connectivity issue from
peers to that OSD?
It looks like you have osd logging disabled -- revert to defaults
while you troubleshoot this.

-- dan

On Wed, Dec 1, 2021 at 5:31 PM Jan Kasprzak <kas@xxxxxxxxxx> wrote:
>
>         Hello,
>
> I am trying to upgrade my Ceph cluster (v15.2.15) from CentOS 7 to CentOS 8
> stream. I upgraded monitors (a month or so ago), and now I want to upgrade
> OSDs: for now I upgraded one host with two OSDs: I kept the partitions
> where OSD data live (I have separate db on NVMe partition and data on
> the whole HDD), and removed/recreated the OS / and /boot/efi partitions.
> When I run
>
> ceph-volume lvm activate --all
>
> the /var/lib/ceph/osd/ceph-* tmpfs volumes get mounted and populated,
> and the ceph-osd processes get started. In "ceph -s", they "2 osds down"
> message disappears, and the number of degraded objects steadily decreases.
> However, after some time the number of degraded objects starts going up
> and down again, and osds appear to be down (and then up again). After 5 minutes
> the OSDs are kicked out from the cluster, and the ceph-osd daemons stop.
> The log from "journalctl -u ceph-osd@32.service" is below.
>
> What else should I check? Thanks!
>
> -Yenya
>
> Dec 01 17:15:20 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:20.384+0100 7f8c4280af00 -1 Falling back to public interface
> Dec 01 17:15:24 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:24.666+0100 7f8c4280af00 -1 osd.32 1119445 log_to_monitors {default=true}
> Dec 01 17:15:25 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:25.334+0100 7f8c34dfa700 -1 osd.32 1119445 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:15:48 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:48.714+0100 7f8c34dfa700 -1 osd.32 1119496 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:16:14 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:14.717+0100 7f8c34dfa700 -1 osd.32 1119508 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:16:45 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:45.682+0100 7f8c34dfa700 -1 osd.32 1119526 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:17:13 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:13.565+0100 7f8c34dfa700 -1 osd.32 1119538 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:17:42 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:42.237+0100 7f8c34dfa700 -1 osd.32 1119548 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
> Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.623+0100 7f8c295e3700 -1 osd.32 1119559 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
> Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
> Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Got signal Interrupt ***
> Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Immediate shutdown (osd_fast_shutdown=true) ***
>
> --
> | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
> | http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
>     We all agree on the necessity of compromise. We just can't agree on
>     when it's necessary to compromise.                     --Larry Wall
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx