OSD repeatedly marked down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	Hello,

I am trying to upgrade my Ceph cluster (v15.2.15) from CentOS 7 to CentOS 8
stream. I upgraded monitors (a month or so ago), and now I want to upgrade
OSDs: for now I upgraded one host with two OSDs: I kept the partitions
where OSD data live (I have separate db on NVMe partition and data on
the whole HDD), and removed/recreated the OS / and /boot/efi partitions.
When I run

ceph-volume lvm activate --all

the /var/lib/ceph/osd/ceph-* tmpfs volumes get mounted and populated,
and the ceph-osd processes get started. In "ceph -s", they "2 osds down"
message disappears, and the number of degraded objects steadily decreases.
However, after some time the number of degraded objects starts going up
and down again, and osds appear to be down (and then up again). After 5 minutes
the OSDs are kicked out from the cluster, and the ceph-osd daemons stop.
The log from "journalctl -u ceph-osd@32.service" is below.

What else should I check? Thanks!

-Yenya

Dec 01 17:15:20 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:20.384+0100 7f8c4280af00 -1 Falling back to public interface
Dec 01 17:15:24 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:24.666+0100 7f8c4280af00 -1 osd.32 1119445 log_to_monitors {default=true}
Dec 01 17:15:25 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:25.334+0100 7f8c34dfa700 -1 osd.32 1119445 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:15:48 my.osd.host ceph-osd[3818]: 2021-12-01T17:15:48.714+0100 7f8c34dfa700 -1 osd.32 1119496 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:16:14 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:14.717+0100 7f8c34dfa700 -1 osd.32 1119508 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:16:45 my.osd.host ceph-osd[3818]: 2021-12-01T17:16:45.682+0100 7f8c34dfa700 -1 osd.32 1119526 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:17:13 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:13.565+0100 7f8c34dfa700 -1 osd.32 1119538 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:17:42 my.osd.host ceph-osd[3818]: 2021-12-01T17:17:42.237+0100 7f8c34dfa700 -1 osd.32 1119548 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.623+0100 7f8c295e3700 -1 osd.32 1119559 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Got signal Interrupt ***
Dec 01 17:18:07 my.osd.host ceph-osd[3818]: 2021-12-01T17:18:07.626+0100 7f8c38e02700 -1 osd.32 1119559 *** Immediate shutdown (osd_fast_shutdown=true) ***

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux