Hi, You should check the central ceph.log to understand why the osd is getting marked down to begin with. Is it a connectivity issue from peers to that OSD? It looks like you have osd logging disabled -- revert to defaults while you troubleshoot this. -- dan On Wed, Dec 1, 2021 at 5:31 PM Jan Kasprzak <kas@xxxxxxxxxx> wrote: > > Hello, > > I am trying to upgrade my Ceph cluster (v15.2.15) from CentOS 7 to CentOS 8 > stream. I upgraded monitors (a month or so ago), and now I want to upgrade > OSDs: for now I upgraded one host with two OSDs: I kept the partitions > where OSD data live (I have separate db on NVMe partition and data on > the whole HDD), and removed/recreated the OS / and /boot/efi partitions. > When I run > > ceph-volume lvm activate --all > > the /var/lib/ceph/osd/ceph-* tmpfs volumes get mounted and populated, > and the ceph-osd processes get started. In "ceph -s", they "2 osds down" > message disappears, and the number of degraded objects steadily decreases. > However, after some time the number of degraded objects starts going up > and down again, and osds appear to be down (and then up again). After 5 minutes > the OSDs are kicked out from the cluster, and the ceph-osd daemons stop. > The log from "journalctl -u ceph-osd@32.service" is below. > > What else should I check? Thanks! > > -Yenya > > Dec 01 17:15:20 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:20.384+0100 7f8c4280af00 -1 Falling back to public interface > Dec 01 17:15:24 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:24.666+0100 7f8c4280af00 -1 osd.32 1119445 log_to_monitors {default=true} > Dec 01 17:15:25 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:25.334+0100 7f8c34dfa700 -1 osd.32 1119445 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:15:48 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:48.714+0100 7f8c34dfa700 -1 osd.32 1119496 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:16:14 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:14.717+0100 7f8c34dfa700 -1 osd.32 1119508 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:16:45 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:45.682+0100 7f8c34dfa700 -1 osd.32 1119526 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:17:13 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:13.565+0100 7f8c34dfa700 -1 osd.32 1119538 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:17:42 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:42.237+0100 7f8c34dfa700 -1 osd.32 1119548 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory > Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.623+0100 7f8c295e3700 -1 osd.32 1119559 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down > Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 > Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Got signal Interrupt *** > Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Immediate shutdown (osd_fast_shutdown=true) *** > > -- > | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | > | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | > We all agree on the necessity of compromise. We just can't agree on > when it's necessary to compromise. --Larry Wall > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx