My ceph cluster became unstable yesterday after zincati (CoreOS's auto-updater) updated one of my nodes from 37.20221225.3.0 to 37.20230110.3.1(*). The symptom was slow ops in my cephfs mds which started immediately the OSDs on this node became in and up. Excluding the OSDs on this node worked round the problem. Note that the node is also running a mon and client workloads which use ceph. Also note that the OSD came up and (IIUC) were participating in recovering their data to other OSDs. The problem only started when I allowed them to be in. I rolled back the OS update and the problem was immediately resolved. Unfortunately I didn't keep the OSD logs, but they lead me to this thread from ceph-users: https://www.mail-archive.com/ceph-users@xxxxxxx/msg18474.html . I wonder if we have an issue with a very recent kernel update. I should be able to reproduce if it's likely to be of use to anybody, but for now I've rolled back this OS update and disabled automatic updating on my other nodes. Matt (*) The complete list of changes: $ rpm-ostree db diff d477f98d52bf707d4282f6835b85bed3d60e305a0cf6eb8effd4db4b89607f05 fc214c16d248686d4cf2bb3050b59c559f091692d7af3b07ef896f1b8ab2f161 ostree diff commit from: d477f98d52bf707d4282f6835b85bed3d60e305a0cf6eb8effd4db4b89607f05 ostree diff commit to: fc214c16d248686d4cf2bb3050b59c559f091692d7af3b07ef896f1b8ab2f161 Upgraded: bash 5.2.9-3.fc37 -> 5.2.15-1.fc37 btrfs-progs 6.0.2-1.fc37 -> 6.1.2-1.fc37 clevis 18-12.fc37 -> 18-14.fc37 clevis-dracut 18-12.fc37 -> 18-14.fc37 clevis-luks 18-12.fc37 -> 18-14.fc37 clevis-systemd 18-12.fc37 -> 18-14.fc37 container-selinux 2:2.193.0-1.fc37 -> 2:2.198.0-1.fc37 containerd 1.6.12-1.fc37 -> 1.6.14-2.fc37 containers-common 4:1-73.fc37 -> 4:1-76.fc37 containers-common-extra 4:1-73.fc37 -> 4:1-76.fc37 coreutils 9.1-6.fc37 -> 9.1-7.fc37 coreutils-common 9.1-6.fc37 -> 9.1-7.fc37 crun 1.7.2-2.fc37 -> 1.7.2-3.fc37 curl 7.85.0-4.fc37 -> 7.85.0-5.fc37 dnsmasq 2.87-3.fc37 -> 2.88-1.fc37 ethtool 2:6.0-1.fc37 -> 2:6.1-1.fc37 fwupd 1.8.8-1.fc37 -> 1.8.9-1.fc37 git-core 2.38.1-1.fc37 -> 2.39.0-1.fc37 grub2-common 1:2.06-63.fc37 -> 1:2.06-72.fc37 grub2-efi-x64 1:2.06-63.fc37 -> 1:2.06-72.fc37 grub2-pc 1:2.06-63.fc37 -> 1:2.06-72.fc37 grub2-pc-modules 1:2.06-63.fc37 -> 1:2.06-72.fc37 grub2-tools 1:2.06-63.fc37 -> 1:2.06-72.fc37 grub2-tools-minimal 1:2.06-63.fc37 -> 1:2.06-72.fc37 kernel 6.0.15-300.fc37 -> 6.0.18-300.fc37 kernel-core 6.0.15-300.fc37 -> 6.0.18-300.fc37 kernel-modules 6.0.15-300.fc37 -> 6.0.18-300.fc37 libcurl-minimal 7.85.0-4.fc37 -> 7.85.0-5.fc37 libgpg-error 1.45-2.fc37 -> 1.46-1.fc37 libgusb 0.4.2-1.fc37 -> 0.4.3-1.fc37 libksba 1.6.2-1.fc37 -> 1.6.3-1.fc37 libpcap 14:1.10.1-4.fc37 -> 14:1.10.2-1.fc37 libpwquality 1.4.4-11.fc37 -> 1.4.5-1.fc37 libsmbclient 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37 libwbclient 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37 moby-engine 20.10.20-1.fc37 -> 20.10.21-1.fc37 ncurses 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37 ncurses-base 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37 ncurses-libs 6.3-3.20220501.fc37 -> 6.3-4.20220501.fc37 net-tools 2.0-0.63.20160912git.fc37 -> 2.0-0.64.20160912git.fc37 rpm-ostree 2022.16-2.fc37 -> 2022.19-2.fc37 rpm-ostree-libs 2022.16-2.fc37 -> 2022.19-2.fc37 samba-client-libs 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37 samba-common 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37 samba-common-libs 2:4.17.4-0.fc37 -> 2:4.17.4-2.fc37 selinux-policy 37.16-1.fc37 -> 37.17-1.fc37 selinux-policy-targeted 37.16-1.fc37 -> 37.17-1.fc37 tpm2-tss 3.2.0-3.fc37 -> 3.2.1-1.fc37 Removed: cracklib-dicts-2.9.7-30.fc37.x86_64 -- Matthew Booth _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx