RHEL jobs failing

David Galloway <dgallowa@xxxxxxxxxx> · Mon, 27 Jun 2022 16:06:37 -0400

Hi all,

On Friday, it was brought to my attention that RHEL jobs were failing. 
Upon investigating, I found the drive of our Satellite server had hit 
capacity.  I increased the VM's disk size, rebooted, and hit XFS 
filesystem corruption.  xfs_repair caused the postgres database to no 
longer start.

So I had to restore the VM from a backup from 2019.  I then proceeded to 
update Satellite but the version we're running now doesn't like the 
testnodes repeatedly re-registering.  See 
https://access.redhat.com/solutions/4207781

I'm still trying to find a reproducer and a workaround.  The one I tried 
this morning (https://github.com/ceph/ceph-cm-ansible/pull/684) did 
reduce some of the failures but has not eliminated them entirely.
--
David Galloway
Principal Systems Administrator
Ceph Engineering

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx