On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote: > "Michael S. Tsirkin" <mst@xxxxxxxxxx> writes: > > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote: > >> Hi List! > >> Hope some one can help me, we had a big issue in our cloud the other > >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 ) > >> went read only filesystem from the guest side because the backing > >> files directory (the openstack _base directory) was compromised and > >> the data was lost, when we realized the data was lost, it took us 5 > >> mins to restore the backup of the backing files, but by that time all > >> the kvm guests received some kind of IO error from the hypervisor > >> layer, and went read only on root filesystem. > >> > >> My question would be, is there a way to hold the IO operations against > >> the backing files ( i thought that would be 99% READ operations ) for > >> a little longer ( im asking this because i dont quite understand what > >> is the process and when it raises the error ) in a case the backing > >> files are missing (no IO possible) but is recoverable within minutes ? > >> > >> Any tip on how to achieve this if possible, or information about how > >> backing files works on kvm, will be amazing. > >> Waiting for feedback! > >> > >> kindest regards. > >> Alejandro Comisario > > > > > > I'm guessing this is what happened: guests timed out meanwhile. > > You can increase the timeout within the guest: > > echo 600 > /sys/block/sda/device/timeout > > to timeout after 10 minutes. > > > > If you have installed qemu guest agent on your system, you can do this > > from the host. Unfortunately by default it's memory can be pushed out to swap > > and then on disk error access there might will fail :( > > Maybe we should consider mlock on all its memory at least as an option. > > > > You could pause your guests, restart them after the issue is resolved, > > and we could I guess add functionality to pause VM on disk errors > > automatically. > > Stefan? > > Would -drive rerror=stop do? I think it will. It's a pity it doesn't appear in --help output - would make it easier to find. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html