Hi,
We've encountered this issue on Ceph Pacific, with an Openstack Wallaby
cluster hooked to it. Essentially, we're slowly pushing this setup into
production so we're testing it and encountered this oddity. My colleague
wanted to do some network redundancy tests, so he manually shutdown a
rbd-backed VM in openstack and then started shutting down network
switches. This didn't go well and caused instabilities on the network,
with potential packet loss. When he fixed the problem, he started the VM
back up and the filesystem was corrupt and non-recoverable. There was no
activity from Ceph clients while the tests were going on. There's no
errors in Ceph status. No missing PGs or objects are reported. As far as
Ceph is concerned, it believes that there's no issue, despite this RBD
mysteriously getting corrupt.
So to recap:
1.Clean shutdown of VM in openstack
2.Network tests cause downtime and packet loss.
3.VM activates but can't boot. Black screen in console.
4.Investigation shows that the XFS filesystem on the VM's sda1 is
unrecoverable by xfs_repair.
So, my question is, when there is no client activity, can data in the
cluster still get corrupt and unrecoverable if there is network
instability? Or is the cause something else?
--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx