Hi all,
On Friday, it was brought to my attention that RHEL jobs were failing.
Upon investigating, I found the drive of our Satellite server had hit
capacity. I increased the VM's disk size, rebooted, and hit XFS
filesystem corruption. xfs_repair caused the postgres database to no
longer start.
So I had to restore the VM from a backup from 2019. I then proceeded to
update Satellite but the version we're running now doesn't like the
testnodes repeatedly re-registering. See
https://access.redhat.com/solutions/4207781
I'm still trying to find a reproducer and a workaround. The one I tried
this morning (https://github.com/ceph/ceph-cm-ansible/pull/684) did
reduce some of the failures but has not eliminated them entirely.
--
David Galloway
Principal Systems Administrator
Ceph Engineering
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx