Hi all,
So I hit a weird issue last week... (EL6 + cman + rgamanager + drbd)
For reasons unknown, a client thought they could start yanking and
replacing hard drives on a running node. Obviously, that did not end
well. The VMs that had been running on the node continues to operate
fine and they just started using the peer's storage.
The problem came when I tried to live-migrate the VMs over to the
still-good node. Obviously, the old host couldn't write to logs, and the
live-migration failed. Once failed, rgmanager also stopped working once
the migration failed. In the end, I had to manually fence the node
(corosync never failed, so it didn't get automatically fenced).
This obviously caused the VMs running on the node to reboot, causing
a ~40 second outage. It strikes me that the system *should* have been
able to migrate, had it not tried to write to the logs.
Is there a way, or can there be made a way, to migrate VMs off of a
node whose underlying FS is read-only/corrupt/destroyed, so long as the
programs in memory are still working?
I am sure this is part a part rgmanager, part KVM/qemu question.
Thanks for any feedback!
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster